copy_misaligned_words: use inline asm on ARM, simplify fallback implementation #808

RalfJung · 2025-03-22T08:29:03Z

Maybe this is faster than #799?

@tgross35 would be great if you could run the benchmarks again. :)

tgross35 · 2025-03-22T11:16:15Z

No luck unfortunately, memcpy gets some minor improvements but memmove regresses further.

Report

mem_icount::memcpy::bench aligned_0:setup(Cfg { len : 16, s_off : 0, d_off : 0 })
bytes: 16 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         519|519                  (No change)
  L1 Hits:                              779|779                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     784|784                  (No change)
  Estimated Cycles:                     954|954                  (No change)
mem_icount::memcpy::bench aligned_1:setup(Cfg { len : 32, s_off : 0, d_off : 0 })
bytes: 32 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         535|535                  (No change)
  L1 Hits:                              803|803                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     808|808                  (No change)
  Estimated Cycles:                     978|978                  (No change)
mem_icount::memcpy::bench aligned_2:setup(Cfg { len : 64, s_off : 0, d_off : 0 })
bytes: 64 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         567|567                  (No change)
  L1 Hits:                              851|851                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     856|856                  (No change)
  Estimated Cycles:                    1026|1026                 (No change)
mem_icount::memcpy::bench aligned_3:setup(Cfg { len : 512, s_off : 0, d_off : 0 })
bytes: 512 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                        1015|1015                 (No change)
  L1 Hits:                             1523|1523                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1528|1528                 (No change)
  Estimated Cycles:                    1698|1698                 (No change)
mem_icount::memcpy::bench aligned_4:setup(Cfg { len : 4096, s_off : 0, d_off : 0 })
bytes: 4096 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                        4599|4599                 (No change)
  L1 Hits:                             6899|6899                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    6904|6904                 (No change)
  Estimated Cycles:                    7074|7074                 (No change)
mem_icount::memcpy::bench aligned_5:setup(Cfg { len : MEG1, s_off : 0, d_off : 0 })
bytes: 1048576 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                     1048886|1048886              (No change)
  L1 Hits:                          1540527|1540527              (No change)
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 1573316|1573316              (No change)
  Estimated Cycles:                 1704742|1704742              (No change)
mem_icount::memcpy::bench offset_0:setup(Cfg { len : 16, s_off : 65, d_off : 65 })
bytes: 16 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         533|533                  (No change)
  L1 Hits:                              799|799                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     804|804                  (No change)
  Estimated Cycles:                     974|974                  (No change)
mem_icount::memcpy::bench offset_1:setup(Cfg { len : 32, s_off : 65, d_off : 65 })
bytes: 32 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         549|549                  (No change)
  L1 Hits:                              823|823                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     828|828                  (No change)
  Estimated Cycles:                     998|998                  (No change)
mem_icount::memcpy::bench offset_2:setup(Cfg { len : 64, s_off : 65, d_off : 65 })
bytes: 64 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         581|581                  (No change)
  L1 Hits:                              871|871                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     876|876                  (No change)
  Estimated Cycles:                    1046|1046                 (No change)
mem_icount::memcpy::bench offset_3:setup(Cfg { len : 512, s_off : 65, d_off : 65 })
bytes: 512 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                        1029|1029                 (No change)
  L1 Hits:                             1543|1543                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1548|1548                 (No change)
  Estimated Cycles:                    1718|1718                 (No change)
mem_icount::memcpy::bench offset_4:setup(Cfg { len : 4096, s_off : 65, d_off : 65 })
bytes: 4096 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                        4613|4613                 (No change)
  L1 Hits:                             6919|6919                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    6924|6924                 (No change)
  Estimated Cycles:                    7094|7094                 (No change)
mem_icount::memcpy::bench offset_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 65 })
bytes: 1048576 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                     1048900|1048900              (No change)
  L1 Hits:                          1540545|1540545              (No change)
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 1573336|1573336              (No change)
  Estimated Cycles:                 1704770|1704770              (No change)
mem_icount::memcpy::bench misaligned_0:setup(Cfg { len : 16, s_off : 65, d_off : 66 })
bytes: 16 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         586|590                  (-0.67797%) [-1.00683x]
  L1 Hits:                              859|861                  (-0.23229%) [-1.00233x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                     868|870                  (-0.22989%) [-1.00230x]
  Estimated Cycles:                    1174|1176                 (-0.17007%) [-1.00170x]
mem_icount::memcpy::bench misaligned_1:setup(Cfg { len : 32, s_off : 65, d_off : 66 })
bytes: 32 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         622|626                  (-0.63898%) [-1.00643x]
  L1 Hits:                              903|905                  (-0.22099%) [-1.00221x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                     912|914                  (-0.21882%) [-1.00219x]
  Estimated Cycles:                    1218|1220                 (-0.16393%) [-1.00164x]
mem_icount::memcpy::bench misaligned_2:setup(Cfg { len : 64, s_off : 65, d_off : 66 })
bytes: 64 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         694|698                  (-0.57307%) [-1.00576x]
  L1 Hits:                              991|993                  (-0.20141%) [-1.00202x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                    1000|1002                 (-0.19960%) [-1.00200x]
  Estimated Cycles:                    1306|1308                 (-0.15291%) [-1.00153x]
mem_icount::memcpy::bench misaligned_3:setup(Cfg { len : 512, s_off : 65, d_off : 66 })
bytes: 512 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                        1702|1706                 (-0.23447%) [-1.00235x]
  L1 Hits:                             2223|2225                 (-0.08989%) [-1.00090x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                    2232|2234                 (-0.08953%) [-1.00090x]
  Estimated Cycles:                    2538|2540                 (-0.07874%) [-1.00079x]
mem_icount::memcpy::bench misaligned_4:setup(Cfg { len : 4096, s_off : 65, d_off : 66 })
bytes: 4096 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                        9766|9770                 (-0.04094%) [-1.00041x]
  L1 Hits:                            12079|12081                (-0.01655%) [-1.00017x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                   12088|12090                (-0.01654%) [-1.00017x]
  Estimated Cycles:                   12394|12396                (-0.01613%) [-1.00016x]
mem_icount::memcpy::bench misaligned_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 66 })
bytes: 1048576 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                     2359659|2359657              (+0.00008%) [+1.00000x]
  L1 Hits:                          2851313|2851307              (+0.00021%) [+1.00000x]
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                              13|13                   (No change)
  Total read+write:                 2884108|2884102              (+0.00021%) [+1.00000x]
  Estimated Cycles:                 3015678|3015672              (+0.00020%) [+1.00000x]
mem_icount::memset::bench aligned_0:setup(Cfg { len : 16, offset : 0 })
bytes: 16, offset: 0
- end of stdout/stderr
  Instructions:                         288|288                  (No change)
  L1 Hits:                              418|418                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     427|427                  (No change)
  Estimated Cycles:                     673|673                  (No change)
mem_icount::memset::bench aligned_1:setup(Cfg { len : 32, offset : 0 })
bytes: 32, offset: 0
- end of stdout/stderr
  Instructions:                         300|300                  (No change)
  L1 Hits:                              434|434                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     443|443                  (No change)
  Estimated Cycles:                     689|689                  (No change)
mem_icount::memset::bench aligned_2:setup(Cfg { len : 64, offset : 0 })
bytes: 64, offset: 0
- end of stdout/stderr
  Instructions:                         324|324                  (No change)
  L1 Hits:                              466|466                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     475|475                  (No change)
  Estimated Cycles:                     721|721                  (No change)
mem_icount::memset::bench aligned_3:setup(Cfg { len : 512, offset : 0 })
bytes: 512, offset: 0
- end of stdout/stderr
  Instructions:                         660|660                  (No change)
  L1 Hits:                              914|914                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     923|923                  (No change)
  Estimated Cycles:                    1169|1169                 (No change)
mem_icount::memset::bench aligned_4:setup(Cfg { len : 4096, offset : 0 })
bytes: 4096, offset: 0
- end of stdout/stderr
  Instructions:                        3348|3348                 (No change)
  L1 Hits:                             4498|4498                 (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                    4507|4507                 (No change)
  Estimated Cycles:                    4753|4753                 (No change)
mem_icount::memset::bench aligned_5:setup(Cfg { len : MEG1, offset : 0 })
bytes: 1048576, offset: 0
- end of stdout/stderr
  Instructions:                      786614|786614               (No change)
  L1 Hits:                          1032429|1032429              (No change)
  L2 Hits:                            16395|16395                (No change)
  RAM Hits:                              10|10                   (No change)
  Total read+write:                 1048834|1048834              (No change)
  Estimated Cycles:                 1114754|1114754              (No change)
mem_icount::memset::bench offset_0:setup(Cfg { len : 16, offset : 65 })
bytes: 16, offset: 65
- end of stdout/stderr
  Instructions:                         300|300                  (No change)
  L1 Hits:                              433|433                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     442|442                  (No change)
  Estimated Cycles:                     688|688                  (No change)
mem_icount::memset::bench offset_1:setup(Cfg { len : 32, offset : 65 })
bytes: 32, offset: 65
- end of stdout/stderr
  Instructions:                         312|312                  (No change)
  L1 Hits:                              449|449                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     458|458                  (No change)
  Estimated Cycles:                     704|704                  (No change)
mem_icount::memset::bench offset_2:setup(Cfg { len : 64, offset : 65 })
bytes: 64, offset: 65
- end of stdout/stderr
  Instructions:                         336|336                  (No change)
  L1 Hits:                              481|481                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     490|490                  (No change)
  Estimated Cycles:                     736|736                  (No change)
mem_icount::memset::bench offset_3:setup(Cfg { len : 512, offset : 65 })
bytes: 512, offset: 65
- end of stdout/stderr
  Instructions:                         672|672                  (No change)
  L1 Hits:                              929|929                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     938|938                  (No change)
  Estimated Cycles:                    1184|1184                 (No change)
mem_icount::memset::bench offset_4:setup(Cfg { len : 4096, offset : 65 })
bytes: 4096, offset: 65
- end of stdout/stderr
  Instructions:                        3360|3360                 (No change)
  L1 Hits:                             4513|4513                 (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                    4522|4522                 (No change)
  Estimated Cycles:                    4768|4768                 (No change)
mem_icount::memset::bench offset_5:setup(Cfg { len : MEG1, offset : 65 })
bytes: 1048576, offset: 65
- end of stdout/stderr
  Instructions:                      786626|786626               (No change)
  L1 Hits:                          1032443|1032443              (No change)
  L2 Hits:                            16396|16396                (No change)
  RAM Hits:                              10|10                   (No change)
  Total read+write:                 1048849|1048849              (No change)
  Estimated Cycles:                 1114773|1114773              (No change)
mem_icount::memcmp::bench aligned_0:setup(Cfg { len : 16, s_off : 0, d_off : 0 })
bytes: 16, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         579|579                  (No change)
  L1 Hits:                              851|850                  (+0.11765%) [+1.00118x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               3|4                    (-25.0000%) [-1.33333x]
  Total read+write:                     854|854                  (No change)
  Estimated Cycles:                     956|990                  (-3.43434%) [-1.03556x]
mem_icount::memcmp::bench aligned_1:setup(Cfg { len : 32, s_off : 0, d_off : 0 })
bytes: 32, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         675|675                  (No change)
  L1 Hits:                              978|977                  (+0.10235%) [+1.00102x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                     982|982                  (No change)
  Estimated Cycles:                    1118|1152                 (-2.95139%) [-1.03041x]
mem_icount::memcmp::bench aligned_2:setup(Cfg { len : 64, s_off : 0, d_off : 0 })
bytes: 64, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         867|867                  (No change)
  L1 Hits:                             1234|1233                 (+0.08110%) [+1.00081x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                    1238|1238                 (No change)
  Estimated Cycles:                    1374|1408                 (-2.41477%) [-1.02475x]
mem_icount::memcmp::bench aligned_3:setup(Cfg { len : 512, s_off : 0, d_off : 0 })
bytes: 512, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                        3555|3555                 (No change)
  L1 Hits:                             4818|4817                 (+0.02076%) [+1.00021x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                    4822|4822                 (No change)
  Estimated Cycles:                    4958|4992                 (-0.68109%) [-1.00686x]
mem_icount::memcmp::bench aligned_4:setup(Cfg { len : 4096, s_off : 0, d_off : 0 })
bytes: 4096, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                       25059|25059                (No change)
  L1 Hits:                            33490|33489                (+0.00299%) [+1.00003x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                   33494|33494                (No change)
  Estimated Cycles:                   33630|33664                (-0.10100%) [-1.00101x]
mem_icount::memcmp::bench aligned_5:setup(Cfg { len : MEG1, s_off : 0, d_off : 0 })
bytes: 1048576, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                     6291746|6291746              (No change)
  L1 Hits:                          8356238|8356237              (+0.00001%) [+1.00000x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                               8|9                    (-11.1111%) [-1.12500x]
  Total read+write:                 8389026|8389026              (No change)
  Estimated Cycles:                 8520418|8520452              (-0.00040%) [-1.00000x]
mem_icount::memcmp::bench offset_0:setup(Cfg { len : 16, s_off : 65, d_off : 65 })
bytes: 16, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         579|579                  (No change)
  L1 Hits:                              850|849                  (+0.11779%) [+1.00118x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                     854|854                  (No change)
  Estimated Cycles:                     990|1024                 (-3.32031%) [-1.03434x]
mem_icount::memcmp::bench offset_1:setup(Cfg { len : 32, s_off : 65, d_off : 65 })
bytes: 32, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         675|675                  (No change)
  L1 Hits:                              978|977                  (+0.10235%) [+1.00102x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                     982|982                  (No change)
  Estimated Cycles:                    1118|1152                 (-2.95139%) [-1.03041x]
mem_icount::memcmp::bench offset_2:setup(Cfg { len : 64, s_off : 65, d_off : 65 })
bytes: 64, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         867|867                  (No change)
  L1 Hits:                             1234|1233                 (+0.08110%) [+1.00081x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                    1238|1238                 (No change)
  Estimated Cycles:                    1374|1408                 (-2.41477%) [-1.02475x]
mem_icount::memcmp::bench offset_3:setup(Cfg { len : 512, s_off : 65, d_off : 65 })
bytes: 512, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                        3555|3555                 (No change)
  L1 Hits:                             4818|4817                 (+0.02076%) [+1.00021x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                    4822|4822                 (No change)
  Estimated Cycles:                    4958|4992                 (-0.68109%) [-1.00686x]
mem_icount::memcmp::bench offset_4:setup(Cfg { len : 4096, s_off : 65, d_off : 65 })
bytes: 4096, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                       25059|25059                (No change)
  L1 Hits:                            33490|33489                (+0.00299%) [+1.00003x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                   33494|33494                (No change)
  Estimated Cycles:                   33630|33664                (-0.10100%) [-1.00101x]
mem_icount::memcmp::bench offset_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 65 })
bytes: 1048576, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                     6291752|6291746              (+0.00010%) [+1.00000x]
  L1 Hits:                          8356244|8356235              (+0.00011%) [+1.00000x]
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                               8|9                    (-11.1111%) [-1.12500x]
  Total read+write:                 8389034|8389026              (+0.00010%) [+1.00000x]
  Estimated Cycles:                 8520434|8520460              (-0.00031%) [-1.00000x]
mem_icount::memcmp::bench misaligned_0:setup(Cfg { len : 16, s_off : 65, d_off : 66 })
bytes: 16, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         579|579                  (No change)
  L1 Hits:                              850|849                  (+0.11779%) [+1.00118x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                     854|854                  (No change)
  Estimated Cycles:                     990|1024                 (-3.32031%) [-1.03434x]
mem_icount::memcmp::bench misaligned_1:setup(Cfg { len : 32, s_off : 65, d_off : 66 })
bytes: 32, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         675|675                  (No change)
  L1 Hits:                              978|977                  (+0.10235%) [+1.00102x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                     982|982                  (No change)
  Estimated Cycles:                    1118|1152                 (-2.95139%) [-1.03041x]
mem_icount::memcmp::bench misaligned_2:setup(Cfg { len : 64, s_off : 65, d_off : 66 })
bytes: 64, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         867|867                  (No change)
  L1 Hits:                             1234|1233                 (+0.08110%) [+1.00081x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                    1238|1238                 (No change)
  Estimated Cycles:                    1374|1408                 (-2.41477%) [-1.02475x]
mem_icount::memcmp::bench misaligned_3:setup(Cfg { len : 512, s_off : 65, d_off : 66 })
bytes: 512, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                        3555|3555                 (No change)
  L1 Hits:                             4818|4817                 (+0.02076%) [+1.00021x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                    4822|4822                 (No change)
  Estimated Cycles:                    4958|4992                 (-0.68109%) [-1.00686x]
mem_icount::memcmp::bench misaligned_4:setup(Cfg { len : 4096, s_off : 65, d_off : 66 })
bytes: 4096, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                       25059|25059                (No change)
  L1 Hits:                            33490|33489                (+0.00299%) [+1.00003x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|5                    (-20.0000%) [-1.25000x]
  Total read+write:                   33494|33494                (No change)
  Estimated Cycles:                   33630|33664                (-0.10100%) [-1.00101x]
mem_icount::memcmp::bench misaligned_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 66 })
bytes: 1048576, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                     6291746|6291746              (No change)
  L1 Hits:                          8356236|8356235              (+0.00001%) [+1.00000x]
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                               8|9                    (-11.1111%) [-1.12500x]
  Total read+write:                 8389026|8389026              (No change)
  Estimated Cycles:                 8520426|8520460              (-0.00040%) [-1.00000x]
mem_icount::memmove::forward aligned_0:setup_forward(Cfg { len : 4096, spread : Aligned, ...
bytes: 4096, spread: 512, offset: 0, forward
- end of stdout/stderr
  Instructions:                        4391|4391                 (No change)
  L1 Hits:                             6579|6579                 (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                    6590|6590                 (No change)
  Estimated Cycles:                    6904|6904                 (No change)
mem_icount::memmove::forward aligned_1:setup_forward(Cfg { len : MEG1, spread : Aligned, ...
bytes: 1048576, spread: 512, offset: 0, forward
- end of stdout/stderr
  Instructions:                     1048777|1048777              (No change)
  L1 Hits:                          1557243|1557243              (No change)
  L2 Hits:                            15902|15902                (No change)
  RAM Hits:                              12|12                   (No change)
  Total read+write:                 1573157|1573157              (No change)
  Estimated Cycles:                 1637173|1637173              (No change)
mem_icount::memmove::forward small_spread_0:setup_forward(Cfg { len : 16, spread : Small, off ...
bytes: 16, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                         411|406                  (+1.23153%) [+1.01232x]
  L1 Hits:                              589|578                  (+1.90311%) [+1.01903x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                     606|598                  (+1.33779%) [+1.01338x]
  Estimated Cycles:                    1124|1218                 (-7.71757%) [-1.08363x]
mem_icount::memmove::forward small_spread_1:setup_forward(Cfg { len : 32, spread : Small, off ...
bytes: 32, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                         479|466                  (+2.78970%) [+1.02790x]
  L1 Hits:                              681|658                  (+3.49544%) [+1.03495x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                     698|678                  (+2.94985%) [+1.02950x]
  Estimated Cycles:                    1216|1298                 (-6.31741%) [-1.06743x]
mem_icount::memmove::forward small_spread_2:setup_forward(Cfg { len : 64, spread : Small, off ...
bytes: 64, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                         615|586                  (+4.94881%) [+1.04949x]
  L1 Hits:                              865|818                  (+5.74572%) [+1.05746x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                     882|838                  (+5.25060%) [+1.05251x]
  Estimated Cycles:                    1400|1458                 (-3.97805%) [-1.04143x]
mem_icount::memmove::forward small_spread_3:setup_forward(Cfg { len : 512, spread : Small, off...
bytes: 512, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                        2519|2266                 (+11.1650%) [+1.11165x]
  L1 Hits:                             3441|3058                 (+12.5245%) [+1.12525x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                    3458|3078                 (+12.3457%) [+1.12346x]
  Estimated Cycles:                    3976|3698                 (+7.51758%) [+1.07518x]
mem_icount::memmove::forward small_spread_4:setup_forward(Cfg { len : 4096, spread : Small, of...
bytes: 4096, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                       17751|15706                (+13.0205%) [+1.13021x]
  L1 Hits:                            24049|20978                (+14.6391%) [+1.14639x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                   24066|20998                (+14.6109%) [+1.14611x]
  Estimated Cycles:                   24584|21618                (+13.7200%) [+1.13720x]
mem_icount::memmove::forward small_spread_5:setup_forward(Cfg { len : MEG1, spread : Small, of...
bytes: 1048576, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                     4456697|3932412              (+13.3324%) [+1.13332x]
  L1 Hits:                          6013768|5227337              (+15.0446%) [+1.15045x]
  L2 Hits:                            15887|15887                (No change)
  RAM Hits:                              18|21                   (-14.2857%) [-1.16667x]
  Total read+write:                 6029673|5243245              (+14.9989%) [+1.14999x]
  Estimated Cycles:                 6093833|5307507              (+14.8154%) [+1.14815x]
mem_icount::memmove::forward medium_spread_0:setup_forward(Cfg { len : 16, spread : Medium, off...
bytes: 16, spread: 9, offset: 0, forward
- end of stdout/stderr
  Instructions:                         411|406                  (+1.23153%) [+1.01232x]
  L1 Hits:                              589|578                  (+1.90311%) [+1.01903x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                     606|598                  (+1.33779%) [+1.01338x]
  Estimated Cycles:                    1124|1218                 (-7.71757%) [-1.08363x]
mem_icount::memmove::forward medium_spread_1:setup_forward(Cfg { len : 32, spread : Medium, off...
bytes: 32, spread: 17, offset: 0, forward
- end of stdout/stderr
  Instructions:                         479|466                  (+2.78970%) [+1.02790x]
  L1 Hits:                              681|658                  (+3.49544%) [+1.03495x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                     698|678                  (+2.94985%) [+1.02950x]
  Estimated Cycles:                    1216|1298                 (-6.31741%) [-1.06743x]
mem_icount::memmove::forward medium_spread_2:setup_forward(Cfg { len : 64, spread : Medium, off...
bytes: 64, spread: 33, offset: 0, forward
- end of stdout/stderr
  Instructions:                         615|586                  (+4.94881%) [+1.04949x]
  L1 Hits:                              865|818                  (+5.74572%) [+1.05746x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                     882|838                  (+5.25060%) [+1.05251x]
  Estimated Cycles:                    1400|1458                 (-3.97805%) [-1.04143x]
mem_icount::memmove::forward medium_spread_3:setup_forward(Cfg { len : 512, spread : Medium, of...
bytes: 512, spread: 257, offset: 0, forward
- end of stdout/stderr
  Instructions:                        2519|2266                 (+11.1650%) [+1.11165x]
  L1 Hits:                             3441|3058                 (+12.5245%) [+1.12525x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                    3458|3078                 (+12.3457%) [+1.12346x]
  Estimated Cycles:                    3976|3698                 (+7.51758%) [+1.07518x]
mem_icount::memmove::forward medium_spread_4:setup_forward(Cfg { len : 4096, spread : Medium, o...
bytes: 4096, spread: 2049, offset: 0, forward
- end of stdout/stderr
  Instructions:                       17751|15706                (+13.0205%) [+1.13021x]
  L1 Hits:                            24049|20978                (+14.6391%) [+1.14639x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                   24066|20998                (+14.6109%) [+1.14611x]
  Estimated Cycles:                   24584|21618                (+13.7200%) [+1.13720x]
mem_icount::memmove::forward medium_spread_5:setup_forward(Cfg { len : MEG1, spread : Medium, o...
bytes: 1048576, spread: 524289, offset: 0, forward
- end of stdout/stderr
  Instructions:                     4456697|3932412              (+13.3324%) [+1.13332x]
  L1 Hits:                          5997132|5210701              (+15.0926%) [+1.15093x]
  L2 Hits:                            32523|32523                (No change)
  RAM Hits:                              18|21                   (-14.2857%) [-1.16667x]
  Total read+write:                 6029673|5243245              (+14.9989%) [+1.14999x]
  Estimated Cycles:                 6160377|5374051              (+14.6319%) [+1.14632x]
mem_icount::memmove::forward large_spread_0:setup_forward(Cfg { len : 16, spread : Large, off ...
bytes: 16, spread: 15, offset: 0, forward
- end of stdout/stderr
  Instructions:                         412|404                  (+1.98020%) [+1.01980x]
  L1 Hits:                              591|575                  (+2.78261%) [+1.02783x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     608|593                  (+2.52951%) [+1.02530x]
  Estimated Cycles:                    1126|1145                 (-1.65939%) [-1.01687x]
mem_icount::memmove::forward large_spread_1:setup_forward(Cfg { len : 32, spread : Large, off ...
bytes: 32, spread: 31, offset: 0, forward
- end of stdout/stderr
  Instructions:                         480|464                  (+3.44828%) [+1.03448x]
  L1 Hits:                              683|655                  (+4.27481%) [+1.04275x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     700|673                  (+4.01189%) [+1.04012x]
  Estimated Cycles:                    1218|1225                 (-0.57143%) [-1.00575x]
mem_icount::memmove::forward large_spread_2:setup_forward(Cfg { len : 64, spread : Large, off ...
bytes: 64, spread: 63, offset: 0, forward
- end of stdout/stderr
  Instructions:                         616|584                  (+5.47945%) [+1.05479x]
  L1 Hits:                              867|815                  (+6.38037%) [+1.06380x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     884|833                  (+6.12245%) [+1.06122x]
  Estimated Cycles:                    1402|1385                 (+1.22744%) [+1.01227x]
mem_icount::memmove::forward large_spread_3:setup_forward(Cfg { len : 512, spread : Large, off...
bytes: 512, spread: 511, offset: 0, forward
- end of stdout/stderr
  Instructions:                        2520|2264                 (+11.3074%) [+1.11307x]
  L1 Hits:                             3443|3055                 (+12.7005%) [+1.12700x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                    3460|3073                 (+12.5936%) [+1.12594x]
  Estimated Cycles:                    3978|3625                 (+9.73793%) [+1.09738x]
mem_icount::memmove::forward large_spread_4:setup_forward(Cfg { len : 4096, spread : Large, of...
bytes: 4096, spread: 4095, offset: 0, forward
- end of stdout/stderr
  Instructions:                       17752|15704                (+13.0413%) [+1.13041x]
  L1 Hits:                            24051|20975                (+14.6651%) [+1.14665x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                   24068|20993                (+14.6477%) [+1.14648x]
  Estimated Cycles:                   24586|21545                (+14.1146%) [+1.14115x]
mem_icount::memmove::forward large_spread_5:setup_forward(Cfg { len : MEG1, spread : Large, of...
bytes: 1048576, spread: 1048575, offset: 0, forward
- end of stdout/stderr
  Instructions:                     4456698|3932410              (+13.3325%) [+1.13332x]
  L1 Hits:                          5997134|5210698              (+15.0927%) [+1.15093x]
  L2 Hits:                            32523|32523                (No change)
  RAM Hits:                              18|19                   (-5.26316%) [-1.05556x]
  Total read+write:                 6029675|5243240              (+14.9990%) [+1.14999x]
  Estimated Cycles:                 6160379|5373978              (+14.6335%) [+1.14633x]
mem_icount::memmove::forward aligned_off_0:setup_forward(Cfg { len : 4096, spread : Aligned, ...
bytes: 4096, spread: 512, offset: 65, forward
- end of stdout/stderr
  Instructions:                        4407|4407                 (No change)
  L1 Hits:                             6601|6600                 (+0.01515%) [+1.00015x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               9|10                   (-10.0000%) [-1.11111x]
  Total read+write:                    6612|6612                 (No change)
  Estimated Cycles:                    6926|6960                 (-0.48851%) [-1.00491x]
mem_icount::memmove::forward aligned_off_1:setup_forward(Cfg { len : MEG1, spread : Aligned, ...
bytes: 1048576, spread: 512, offset: 65, forward
- end of stdout/stderr
  Instructions:                     1048793|1048793              (No change)
  L1 Hits:                          1557264|1557263              (+0.00006%) [+1.00000x]
  L2 Hits:                            15903|15903                (No change)
  RAM Hits:                              12|13                   (-7.69231%) [-1.08333x]
  Total read+write:                 1573179|1573179              (No change)
  Estimated Cycles:                 1637199|1637233              (-0.00208%) [-1.00002x]
mem_icount::memmove::forward small_spread_off_0:setup_forward(Cfg { len : 16, spread : Small, off ...
bytes: 16, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                         411|406                  (+1.23153%) [+1.01232x]
  L1 Hits:                              589|578                  (+1.90311%) [+1.01903x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                     606|598                  (+1.33779%) [+1.01338x]
  Estimated Cycles:                    1124|1218                 (-7.71757%) [-1.08363x]
mem_icount::memmove::forward small_spread_off_1:setup_forward(Cfg { len : 32, spread : Small, off ...
bytes: 32, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                         479|466                  (+2.78970%) [+1.02790x]
  L1 Hits:                              681|658                  (+3.49544%) [+1.03495x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                     698|678                  (+2.94985%) [+1.02950x]
  Estimated Cycles:                    1216|1298                 (-6.31741%) [-1.06743x]
mem_icount::memmove::forward small_spread_off_2:setup_forward(Cfg { len : 64, spread : Small, off ...
bytes: 64, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                         615|586                  (+4.94881%) [+1.04949x]
  L1 Hits:                              865|818                  (+5.74572%) [+1.05746x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                     882|838                  (+5.25060%) [+1.05251x]
  Estimated Cycles:                    1400|1458                 (-3.97805%) [-1.04143x]
mem_icount::memmove::forward small_spread_off_3:setup_forward(Cfg { len : 512, spread : Small, off...
bytes: 512, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                        2519|2266                 (+11.1650%) [+1.11165x]
  L1 Hits:                             3441|3058                 (+12.5245%) [+1.12525x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                    3458|3078                 (+12.3457%) [+1.12346x]
  Estimated Cycles:                    3976|3698                 (+7.51758%) [+1.07518x]
mem_icount::memmove::forward small_spread_off_4:setup_forward(Cfg { len : 4096, spread : Small, of...
bytes: 4096, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                       17751|15706                (+13.0205%) [+1.13021x]
  L1 Hits:                            24049|20978                (+14.6391%) [+1.14639x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                   24066|20998                (+14.6109%) [+1.14611x]
  Estimated Cycles:                   24584|21618                (+13.7200%) [+1.13720x]
mem_icount::memmove::forward small_spread_off_5:setup_forward(Cfg { len : MEG1, spread : Small, of...
bytes: 1048576, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                     4456697|3932412              (+13.3324%) [+1.13332x]
  L1 Hits:                          6013769|5227338              (+15.0446%) [+1.15045x]
  L2 Hits:                            15886|15886                (No change)
  RAM Hits:                              18|21                   (-14.2857%) [-1.16667x]
  Total read+write:                 6029673|5243245              (+14.9989%) [+1.14999x]
  Estimated Cycles:                 6093829|5307503              (+14.8154%) [+1.14815x]
mem_icount::memmove::forward medium_spread_off_0:setup_forward(Cfg { len : 16, spread : Medium, off...
bytes: 16, spread: 9, offset: 65, forward
- end of stdout/stderr
  Instructions:                         411|406                  (+1.23153%) [+1.01232x]
  L1 Hits:                              589|578                  (+1.90311%) [+1.01903x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                     606|598                  (+1.33779%) [+1.01338x]
  Estimated Cycles:                    1124|1218                 (-7.71757%) [-1.08363x]
mem_icount::memmove::forward medium_spread_off_1:setup_forward(Cfg { len : 32, spread : Medium, off...
bytes: 32, spread: 17, offset: 65, forward
- end of stdout/stderr
  Instructions:                         479|466                  (+2.78970%) [+1.02790x]
  L1 Hits:                              681|658                  (+3.49544%) [+1.03495x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                     698|678                  (+2.94985%) [+1.02950x]
  Estimated Cycles:                    1216|1298                 (-6.31741%) [-1.06743x]
mem_icount::memmove::forward medium_spread_off_2:setup_forward(Cfg { len : 64, spread : Medium, off...
bytes: 64, spread: 33, offset: 65, forward
- end of stdout/stderr
  Instructions:                         615|586                  (+4.94881%) [+1.04949x]
  L1 Hits:                              865|818                  (+5.74572%) [+1.05746x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                     882|838                  (+5.25060%) [+1.05251x]
  Estimated Cycles:                    1400|1458                 (-3.97805%) [-1.04143x]
mem_icount::memmove::forward medium_spread_off_3:setup_forward(Cfg { len : 512, spread : Medium, of...
bytes: 512, spread: 257, offset: 65, forward
- end of stdout/stderr
  Instructions:                        2519|2266                 (+11.1650%) [+1.11165x]
  L1 Hits:                             3441|3058                 (+12.5245%) [+1.12525x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                    3458|3078                 (+12.3457%) [+1.12346x]
  Estimated Cycles:                    3976|3698                 (+7.51758%) [+1.07518x]
mem_icount::memmove::forward medium_spread_off_4:setup_forward(Cfg { len : 4096, spread : Medium, o...
bytes: 4096, spread: 2049, offset: 65, forward
- end of stdout/stderr
  Instructions:                       17751|15706                (+13.0205%) [+1.13021x]
  L1 Hits:                            24049|20978                (+14.6391%) [+1.14639x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                   24066|20998                (+14.6109%) [+1.14611x]
  Estimated Cycles:                   24584|21618                (+13.7200%) [+1.13720x]
mem_icount::memmove::forward medium_spread_off_5:setup_forward(Cfg { len : MEG1, spread : Medium, o...
bytes: 1048576, spread: 524289, offset: 65, forward
- end of stdout/stderr
  Instructions:                     4456697|3932412              (+13.3324%) [+1.13332x]
  L1 Hits:                          5997131|5210700              (+15.0926%) [+1.15093x]
  L2 Hits:                            32524|32524                (No change)
  RAM Hits:                              18|21                   (-14.2857%) [-1.16667x]
  Total read+write:                 6029673|5243245              (+14.9989%) [+1.14999x]
  Estimated Cycles:                 6160381|5374055              (+14.6319%) [+1.14632x]
mem_icount::memmove::forward large_spread_off_0:setup_forward(Cfg { len : 16, spread : Large, off ...
bytes: 16, spread: 15, offset: 65, forward
- end of stdout/stderr
  Instructions:                         409|399                  (+2.50627%) [+1.02506x]
  L1 Hits:                              586|568                  (+3.16901%) [+1.03169x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|15                   (No change)
  Total read+write:                     603|585                  (+3.07692%) [+1.03077x]
  Estimated Cycles:                    1121|1103                 (+1.63191%) [+1.01632x]
mem_icount::memmove::forward large_spread_off_1:setup_forward(Cfg { len : 32, spread : Large, off ...
bytes: 32, spread: 31, offset: 65, forward
- end of stdout/stderr
  Instructions:                         477|459                  (+3.92157%) [+1.03922x]
  L1 Hits:                              678|648                  (+4.62963%) [+1.04630x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|15                   (No change)
  Total read+write:                     695|665                  (+4.51128%) [+1.04511x]
  Estimated Cycles:                    1213|1183                 (+2.53593%) [+1.02536x]
mem_icount::memmove::forward large_spread_off_2:setup_forward(Cfg { len : 64, spread : Large, off ...
bytes: 64, spread: 63, offset: 65, forward
- end of stdout/stderr
  Instructions:                         613|579                  (+5.87219%) [+1.05872x]
  L1 Hits:                              862|808                  (+6.68317%) [+1.06683x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|15                   (No change)
  Total read+write:                     879|825                  (+6.54545%) [+1.06545x]
  Estimated Cycles:                    1397|1343                 (+4.02085%) [+1.04021x]
mem_icount::memmove::forward large_spread_off_3:setup_forward(Cfg { len : 512, spread : Large, off...
bytes: 512, spread: 511, offset: 65, forward
- end of stdout/stderr
  Instructions:                        2517|2259                 (+11.4210%) [+1.11421x]
  L1 Hits:                             3438|3048                 (+12.7953%) [+1.12795x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|15                   (No change)
  Total read+write:                    3455|3065                 (+12.7243%) [+1.12724x]
  Estimated Cycles:                    3973|3583                 (+10.8847%) [+1.10885x]
mem_icount::memmove::forward large_spread_off_4:setup_forward(Cfg { len : 4096, spread : Large, of...
bytes: 4096, spread: 4095, offset: 65, forward
- end of stdout/stderr
  Instructions:                       17749|15699                (+13.0582%) [+1.13058x]
  L1 Hits:                            24046|20968                (+14.6795%) [+1.14680x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|15                   (No change)
  Total read+write:                   24063|20985                (+14.6676%) [+1.14668x]
  Estimated Cycles:                   24581|21503                (+14.3143%) [+1.14314x]
mem_icount::memmove::forward large_spread_off_5:setup_forward(Cfg { len : MEG1, spread : Large, of...
bytes: 1048576, spread: 1048575, offset: 65, forward
- end of stdout/stderr
  Instructions:                     4456695|3932405              (+13.3326%) [+1.13333x]
  L1 Hits:                          5997130|5210692              (+15.0928%) [+1.15093x]
  L2 Hits:                            32522|32522                (No change)
  RAM Hits:                              18|18                   (No change)
  Total read+write:                 6029670|5243232              (+14.9991%) [+1.14999x]
  Estimated Cycles:                 6160370|5373932              (+14.6343%) [+1.14634x]
mem_icount::memmove::backward aligned_0:setup_backward(Cfg { len : 4096, spread : Aligned,...
bytes: 4096, spread: 512, offset: 0, backward
- end of stdout/stderr
  Instructions:                        4388|4388                 (No change)
  L1 Hits:                             6580|6579                 (+0.01520%) [+1.00015x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               9|10                   (-10.0000%) [-1.11111x]
  Total read+write:                    6591|6591                 (No change)
  Estimated Cycles:                    6905|6939                 (-0.48998%) [-1.00492x]
mem_icount::memmove::backward aligned_1:setup_backward(Cfg { len : MEG1, spread : Aligned,...
bytes: 1048576, spread: 512, offset: 0, backward
- end of stdout/stderr
  Instructions:                     1048774|1048774              (No change)
  L1 Hits:                          1556743|1556742              (+0.00006%) [+1.00000x]
  L2 Hits:                            16403|16403                (No change)
  RAM Hits:                              12|13                   (-7.69231%) [-1.08333x]
  Total read+write:                 1573158|1573158              (No change)
  Estimated Cycles:                 1639178|1639212              (-0.00207%) [-1.00002x]
mem_icount::memmove::backward small_spread_0:setup_backward(Cfg { len : 16, spread : Small, off...
bytes: 16, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                         366|370                  (-1.08108%) [-1.01093x]
  L1 Hits:                              525|524                  (+0.19084%) [+1.00191x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     542|542                  (No change)
  Estimated Cycles:                    1060|1094                 (-3.10786%) [-1.03208x]
mem_icount::memmove::backward small_spread_1:setup_backward(Cfg { len : 32, spread : Small, off...
bytes: 32, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                         402|406                  (-0.98522%) [-1.00995x]
  L1 Hits:                              569|568                  (+0.17606%) [+1.00176x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     586|586                  (No change)
  Estimated Cycles:                    1104|1138                 (-2.98770%) [-1.03080x]
mem_icount::memmove::backward small_spread_2:setup_backward(Cfg { len : 64, spread : Small, off...
bytes: 64, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                         474|478                  (-0.83682%) [-1.00844x]
  L1 Hits:                              657|656                  (+0.15244%) [+1.00152x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     674|674                  (No change)
  Estimated Cycles:                    1192|1226                 (-2.77325%) [-1.02852x]
mem_icount::memmove::backward small_spread_3:setup_backward(Cfg { len : 512, spread : Small, of...
bytes: 512, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                        1482|1486                 (-0.26918%) [-1.00270x]
  L1 Hits:                             1889|1888                 (+0.05297%) [+1.00053x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                    1906|1906                 (No change)
  Estimated Cycles:                    2424|2458                 (-1.38324%) [-1.01403x]
mem_icount::memmove::backward small_spread_4:setup_backward(Cfg { len : 4096, spread : Small, o...
bytes: 4096, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                        9546|9550                 (-0.04188%) [-1.00042x]
  L1 Hits:                            11745|11744                (+0.00851%) [+1.00009x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                   11762|11762                (No change)
  Estimated Cycles:                   12280|12314                (-0.27611%) [-1.00277x]
mem_icount::memmove::backward small_spread_5:setup_backward(Cfg { len : MEG1, spread : Small, o...
bytes: 1048576, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                     2359532|2359536              (-0.00017%) [-1.00000x]
  L1 Hits:                          2867515|2867514              (+0.00003%) [+1.00000x]
  L2 Hits:                            16396|16396                (No change)
  RAM Hits:                              18|19                   (-5.26316%) [-1.05556x]
  Total read+write:                 2883929|2883929              (No change)
  Estimated Cycles:                 2950125|2950159              (-0.00115%) [-1.00001x]
mem_icount::memmove::backward medium_spread_0:setup_backward(Cfg { len : 16, spread : Medium, of...
bytes: 16, spread: 9, offset: 0, backward
- end of stdout/stderr
  Instructions:                         366|370                  (-1.08108%) [-1.01093x]
  L1 Hits:                              525|524                  (+0.19084%) [+1.00191x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     542|542                  (No change)
  Estimated Cycles:                    1060|1094                 (-3.10786%) [-1.03208x]
mem_icount::memmove::backward medium_spread_1:setup_backward(Cfg { len : 32, spread : Medium, of...
bytes: 32, spread: 17, offset: 0, backward
- end of stdout/stderr
  Instructions:                         402|406                  (-0.98522%) [-1.00995x]
  L1 Hits:                              569|568                  (+0.17606%) [+1.00176x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     586|586                  (No change)
  Estimated Cycles:                    1104|1138                 (-2.98770%) [-1.03080x]
mem_icount::memmove::backward medium_spread_2:setup_backward(Cfg { len : 64, spread : Medium, of...
bytes: 64, spread: 33, offset: 0, backward
- end of stdout/stderr
  Instructions:                         474|478                  (-0.83682%) [-1.00844x]
  L1 Hits:                              657|656                  (+0.15244%) [+1.00152x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     674|674                  (No change)
  Estimated Cycles:                    1192|1226                 (-2.77325%) [-1.02852x]
mem_icount::memmove::backward medium_spread_3:setup_backward(Cfg { len : 512, spread : Medium, o...
bytes: 512, spread: 257, offset: 0, backward
- end of stdout/stderr
  Instructions:                        1482|1486                 (-0.26918%) [-1.00270x]
  L1 Hits:                             1889|1888                 (+0.05297%) [+1.00053x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                    1906|1906                 (No change)
  Estimated Cycles:                    2424|2458                 (-1.38324%) [-1.01403x]
mem_icount::memmove::backward medium_spread_4:setup_backward(Cfg { len : 4096, spread : Medium, ...
bytes: 4096, spread: 2049, offset: 0, backward
- end of stdout/stderr
  Instructions:                        9546|9550                 (-0.04188%) [-1.00042x]
  L1 Hits:                            11745|11744                (+0.00851%) [+1.00009x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                   11762|11762                (No change)
  Estimated Cycles:                   12280|12314                (-0.27611%) [-1.00277x]
mem_icount::memmove::backward medium_spread_5:setup_backward(Cfg { len : MEG1, spread : Medium, ...
bytes: 1048576, spread: 524289, offset: 0, backward
- end of stdout/stderr
  Instructions:                     2359532|2359536              (-0.00017%) [-1.00000x]
  L1 Hits:                          2851131|2851130              (+0.00004%) [+1.00000x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                              18|19                   (-5.26316%) [-1.05556x]
  Total read+write:                 2883929|2883929              (No change)
  Estimated Cycles:                 3015661|3015695              (-0.00113%) [-1.00001x]
mem_icount::memmove::backward large_spread_0:setup_backward(Cfg { len : 16, spread : Large, off...
bytes: 16, spread: 15, offset: 0, backward
- end of stdout/stderr
  Instructions:                         368|369                  (-0.27100%) [-1.00272x]
  L1 Hits:                              529|526                  (+0.57034%) [+1.00570x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     546|544                  (+0.36765%) [+1.00368x]
  Estimated Cycles:                    1064|1096                 (-2.91971%) [-1.03008x]
mem_icount::memmove::backward large_spread_1:setup_backward(Cfg { len : 32, spread : Large, off...
bytes: 32, spread: 31, offset: 0, backward
- end of stdout/stderr
  Instructions:                         404|405                  (-0.24691%) [-1.00248x]
  L1 Hits:                              573|570                  (+0.52632%) [+1.00526x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     590|588                  (+0.34014%) [+1.00340x]
  Estimated Cycles:                    1108|1140                 (-2.80702%) [-1.02888x]
mem_icount::memmove::backward large_spread_2:setup_backward(Cfg { len : 64, spread : Large, off...
bytes: 64, spread: 63, offset: 0, backward
- end of stdout/stderr
  Instructions:                         476|477                  (-0.20964%) [-1.00210x]
  L1 Hits:                              661|658                  (+0.45593%) [+1.00456x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     678|676                  (+0.29586%) [+1.00296x]
  Estimated Cycles:                    1196|1228                 (-2.60586%) [-1.02676x]
mem_icount::memmove::backward large_spread_3:setup_backward(Cfg { len : 512, spread : Large, of...
bytes: 512, spread: 511, offset: 0, backward
- end of stdout/stderr
  Instructions:                        1484|1485                 (-0.06734%) [-1.00067x]
  L1 Hits:                             1893|1890                 (+0.15873%) [+1.00159x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                    1910|1908                 (+0.10482%) [+1.00105x]
  Estimated Cycles:                    2428|2460                 (-1.30081%) [-1.01318x]
mem_icount::memmove::backward large_spread_4:setup_backward(Cfg { len : 4096, spread : Large, o...
bytes: 4096, spread: 4095, offset: 0, backward
- end of stdout/stderr
  Instructions:                        9548|9549                 (-0.01047%) [-1.00010x]
  L1 Hits:                            11749|11746                (+0.02554%) [+1.00026x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                   11766|11764                (+0.01700%) [+1.00017x]
  Estimated Cycles:                   12284|12316                (-0.25982%) [-1.00261x]
mem_icount::memmove::backward large_spread_5:setup_backward(Cfg { len : MEG1, spread : Large, o...
bytes: 1048576, spread: 1048575, offset: 0, backward
- end of stdout/stderr
  Instructions:                     2359534|2359535              (-0.00004%) [-1.00000x]
  L1 Hits:                          2851135|2851132              (+0.00011%) [+1.00000x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                              18|19                   (-5.26316%) [-1.05556x]
  Total read+write:                 2883933|2883931              (+0.00007%) [+1.00000x]
  Estimated Cycles:                 3015665|3015697              (-0.00106%) [-1.00001x]
mem_icount::memmove::backward aligned_off_0:setup_backward(Cfg { len : 4096, spread : Aligned,...
bytes: 4096, spread: 512, offset: 65, backward
- end of stdout/stderr
  Instructions:                        4402|4402                 (No change)
  L1 Hits:                             6600|6599                 (+0.01515%) [+1.00015x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               9|10                   (-10.0000%) [-1.11111x]
  Total read+write:                    6611|6611                 (No change)
  Estimated Cycles:                    6925|6959                 (-0.48858%) [-1.00491x]
mem_icount::memmove::backward aligned_off_1:setup_backward(Cfg { len : MEG1, spread : Aligned,...
bytes: 1048576, spread: 512, offset: 65, backward
- end of stdout/stderr
  Instructions:                     1048788|1048788              (No change)
  L1 Hits:                          1556762|1556761              (+0.00006%) [+1.00000x]
  L2 Hits:                            16404|16404                (No change)
  RAM Hits:                              12|13                   (-7.69231%) [-1.08333x]
  Total read+write:                 1573178|1573178              (No change)
  Estimated Cycles:                 1639202|1639236              (-0.00207%) [-1.00002x]
mem_icount::memmove::backward small_spread_off_0:setup_backward(Cfg { len : 16, spread : Small, off...
bytes: 16, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                         375|379                  (-1.05541%) [-1.01067x]
  L1 Hits:                              540|539                  (+0.18553%) [+1.00186x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     557|557                  (No change)
  Estimated Cycles:                    1075|1109                 (-3.06583%) [-1.03163x]
mem_icount::memmove::backward small_spread_off_1:setup_backward(Cfg { len : 32, spread : Small, off...
bytes: 32, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                         411|415                  (-0.96386%) [-1.00973x]
  L1 Hits:                              584|583                  (+0.17153%) [+1.00172x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     601|601                  (No change)
  Estimated Cycles:                    1119|1153                 (-2.94883%) [-1.03038x]
mem_icount::memmove::backward small_spread_off_2:setup_backward(Cfg { len : 64, spread : Small, off...
bytes: 64, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                         483|487                  (-0.82136%) [-1.00828x]
  L1 Hits:                              672|671                  (+0.14903%) [+1.00149x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     689|689                  (No change)
  Estimated Cycles:                    1207|1241                 (-2.73973%) [-1.02817x]
mem_icount::memmove::backward small_spread_off_3:setup_backward(Cfg { len : 512, spread : Small, of...
bytes: 512, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                        1491|1495                 (-0.26756%) [-1.00268x]
  L1 Hits:                             1904|1903                 (+0.05255%) [+1.00053x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                    1921|1921                 (No change)
  Estimated Cycles:                    2439|2473                 (-1.37485%) [-1.01394x]
mem_icount::memmove::backward small_spread_off_4:setup_backward(Cfg { len : 4096, spread : Small, o...
bytes: 4096, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                        9555|9559                 (-0.04185%) [-1.00042x]
  L1 Hits:                            11760|11759                (+0.00850%) [+1.00009x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                   11777|11777                (No change)
  Estimated Cycles:                   12295|12329                (-0.27577%) [-1.00277x]
mem_icount::memmove::backward small_spread_off_5:setup_backward(Cfg { len : MEG1, spread : Small, o...
bytes: 1048576, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                     2359541|2359545              (-0.00017%) [-1.00000x]
  L1 Hits:                          2867530|2867529              (+0.00003%) [+1.00000x]
  L2 Hits:                            16396|16396                (No change)
  RAM Hits:                              18|19                   (-5.26316%) [-1.05556x]
  Total read+write:                 2883944|2883944              (No change)
  Estimated Cycles:                 2950140|2950174              (-0.00115%) [-1.00001x]
mem_icount::memmove::backward medium_spread_off_0:setup_backward(Cfg { len : 16, spread : Medium, of...
bytes: 16, spread: 9, offset: 65, backward
- end of stdout/stderr
  Instructions:                         375|379                  (-1.05541%) [-1.01067x]
  L1 Hits:                              540|539                  (+0.18553%) [+1.00186x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     557|557                  (No change)
  Estimated Cycles:                    1075|1109                 (-3.06583%) [-1.03163x]
mem_icount::memmove::backward medium_spread_off_1:setup_backward(Cfg { len : 32, spread : Medium, of...
bytes: 32, spread: 17, offset: 65, backward
- end of stdout/stderr
  Instructions:                         411|415                  (-0.96386%) [-1.00973x]
  L1 Hits:                              584|583                  (+0.17153%) [+1.00172x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     601|601                  (No change)
  Estimated Cycles:                    1119|1153                 (-2.94883%) [-1.03038x]
mem_icount::memmove::backward medium_spread_off_2:setup_backward(Cfg { len : 64, spread : Medium, of...
bytes: 64, spread: 33, offset: 65, backward
- end of stdout/stderr
  Instructions:                         483|487                  (-0.82136%) [-1.00828x]
  L1 Hits:                              672|671                  (+0.14903%) [+1.00149x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     689|689                  (No change)
  Estimated Cycles:                    1207|1241                 (-2.73973%) [-1.02817x]
mem_icount::memmove::backward medium_spread_off_3:setup_backward(Cfg { len : 512, spread : Medium, o...
bytes: 512, spread: 257, offset: 65, backward
- end of stdout/stderr
  Instructions:                        1491|1495                 (-0.26756%) [-1.00268x]
  L1 Hits:                             1904|1903                 (+0.05255%) [+1.00053x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                    1921|1921                 (No change)
  Estimated Cycles:                    2439|2473                 (-1.37485%) [-1.01394x]
mem_icount::memmove::backward medium_spread_off_4:setup_backward(Cfg { len : 4096, spread : Medium, ...
bytes: 4096, spread: 2049, offset: 65, backward
- end of stdout/stderr
  Instructions:                        9555|9559                 (-0.04185%) [-1.00042x]
  L1 Hits:                            11760|11759                (+0.00850%) [+1.00009x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                   11777|11777                (No change)
  Estimated Cycles:                   12295|12329                (-0.27577%) [-1.00277x]
mem_icount::memmove::backward medium_spread_off_5:setup_backward(Cfg { len : MEG1, spread : Medium, ...
bytes: 1048576, spread: 524289, offset: 65, backward
- end of stdout/stderr
  Instructions:                     2359541|2359545              (-0.00017%) [-1.00000x]
  L1 Hits:                          2851145|2851144              (+0.00004%) [+1.00000x]
  L2 Hits:                            32781|32781                (No change)
  RAM Hits:                              18|19                   (-5.26316%) [-1.05556x]
  Total read+write:                 2883944|2883944              (No change)
  Estimated Cycles:                 3015680|3015714              (-0.00113%) [-1.00001x]
mem_icount::memmove::backward large_spread_off_0:setup_backward(Cfg { len : 16, spread : Large, off...
bytes: 16, spread: 15, offset: 65, backward
- end of stdout/stderr
  Instructions:                         377|378                  (-0.26455%) [-1.00265x]
  L1 Hits:                              544|541                  (+0.55453%) [+1.00555x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     561|559                  (+0.35778%) [+1.00358x]
  Estimated Cycles:                    1079|1111                 (-2.88029%) [-1.02966x]
mem_icount::memmove::backward large_spread_off_1:setup_backward(Cfg { len : 32, spread : Large, off...
bytes: 32, spread: 31, offset: 65, backward
- end of stdout/stderr
  Instructions:                         413|414                  (-0.24155%) [-1.00242x]
  L1 Hits:                              588|585                  (+0.51282%) [+1.00513x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     605|603                  (+0.33167%) [+1.00332x]
  Estimated Cycles:                    1123|1155                 (-2.77056%) [-1.02850x]
mem_icount::memmove::backward large_spread_off_2:setup_backward(Cfg { len : 64, spread : Large, off...
bytes: 64, spread: 63, offset: 65, backward
- end of stdout/stderr
  Instructions:                         485|486                  (-0.20576%) [-1.00206x]
  L1 Hits:                              676|673                  (+0.44577%) [+1.00446x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                     693|691                  (+0.28944%) [+1.00289x]
  Estimated Cycles:                    1211|1243                 (-2.57442%) [-1.02642x]
mem_icount::memmove::backward large_spread_off_3:setup_backward(Cfg { len : 512, spread : Large, of...
bytes: 512, spread: 511, offset: 65, backward
- end of stdout/stderr
  Instructions:                        1493|1494                 (-0.06693%) [-1.00067x]
  L1 Hits:                             1908|1905                 (+0.15748%) [+1.00157x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                    1925|1923                 (+0.10400%) [+1.00104x]
  Estimated Cycles:                    2443|2475                 (-1.29293%) [-1.01310x]
mem_icount::memmove::backward large_spread_off_4:setup_backward(Cfg { len : 4096, spread : Large, o...
bytes: 4096, spread: 4095, offset: 65, backward
- end of stdout/stderr
  Instructions:                        9557|9558                 (-0.01046%) [-1.00010x]
  L1 Hits:                            11764|11761                (+0.02551%) [+1.00026x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|16                   (-6.25000%) [-1.06667x]
  Total read+write:                   11781|11779                (+0.01698%) [+1.00017x]
  Estimated Cycles:                   12299|12331                (-0.25951%) [-1.00260x]
mem_icount::memmove::backward large_spread_off_5:setup_backward(Cfg { len : MEG1, spread : Large, o...
bytes: 1048576, spread: 1048575, offset: 65, backward
- end of stdout/stderr
  Instructions:                     2359543|2359544              (-0.00004%) [-1.00000x]
  L1 Hits:                          2851150|2851147              (+0.00011%) [+1.00000x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                              18|19                   (-5.26316%) [-1.05556x]
  Total read+write:                 2883948|2883946              (+0.00007%) [+1.00000x]
  Estimated Cycles:                 3015680|3015712              (-0.00106%) [-1.00001x]

tgross35 · 2025-03-22T11:29:53Z

For reference:

With this PR

compiler_builtins::mem::memcpy:
        .fnstart
        .cfi_startproc
        .save   {r4, r5, r6, r7, r8, r9, r10, r11, lr}
        push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
        .cfi_def_cfa_offset 36
        .cfi_offset lr, -4
        .cfi_offset r11, -8
        .cfi_offset r10, -12
        .cfi_offset r9, -16
        .cfi_offset r8, -20
        .cfi_offset r7, -24
        .cfi_offset r6, -28
        .cfi_offset r5, -32
        .cfi_offset r4, -36
        .pad    #12
        sub sp, sp, #12
        .cfi_def_cfa_offset 48
        cmp r2, #16
        blo .LBB427_9
        rsb r3, r0, #0
        and r4, r3, #3
        add lr, r0, r4
        cmp r0, lr
        bhs .LBB427_4
        mov r3, r4
        mov r7, r0
        mov r6, r1
.LBB427_3:
        ldrb r5, [r6], #1
        subs r3, r3, #1
        strb r5, [r7], #1
        bne .LBB427_3
.LBB427_4:
        sub r12, r2, r4
        add r1, r1, r4
        bic r2, r12, #3
        ands r4, r1, #3
        add r3, lr, r2
        bne .LBB427_12
        cmp lr, r3
        bhs .LBB427_8
        mov r4, r1
.LBB427_7:
        ldr r5, [r4], #4
        str r5, [lr], #4
        cmp lr, r3
        blo .LBB427_7
.LBB427_8:
        add r1, r1, r2
        and r2, r12, #3
        add r7, r3, r2
        cmp r3, r7
        blo .LBB427_10
        b .LBB427_11
.LBB427_9:
        mov r3, r0
        add r7, r3, r2
        cmp r3, r7
        bhs .LBB427_11
.LBB427_10:
        ldrb r7, [r1], #1
        subs r2, r2, #1
        strb r7, [r3], #1
        bne .LBB427_10
.LBB427_11:
        add sp, sp, #12
        pop {r4, r5, r6, r7, r8, r9, r10, r11, pc}
.LBB427_12:
        add r5, sp, #8
        mov r7, #0
        orr r6, r5, r4
        add r8, r5, #4
        lsl r9, r4, #3
        cmp r6, r8
        str r7, [sp, #8]
        bhs .LBB427_16
        ldrb r7, [r1]
        mov r5, r6
        strb r7, [r5], #1
        cmp r5, r8
        beq .LBB427_15
        ldrb r5, [r1, #1]
        strb r5, [r6, #1]
        add r5, r6, #2
        cmp r5, r8
        ldrbne r6, [r1, #2]
        strbne r6, [r5]
.LBB427_15:
        ldr r7, [sp, #8]
.LBB427_16:
        add r5, lr, #4
        sub r6, r1, r4
        cmp r5, r3
        rsb r5, r9, #0
        str r5, [sp]
        bhs .LBB427_19
        and r11, r5, #24
.LBB427_18:
        ldr r8, [r6, #4]!
        add r10, lr, #4
        lsl r5, r8, r11
        orr r5, r5, r7, lsr r9
        str r5, [lr], #8
        cmp lr, r3
        mov lr, r10
        mov r7, r8
        blo .LBB427_18
        b .LBB427_20
.LBB427_19:
        mov r8, r7
        mov r10, lr
.LBB427_20:
        add r11, sp, #4
        ldr r5, [sp]
        orr r7, r11, r4
        mov lr, #0
        cmp r11, r7
        str lr, [sp, #4]
        bhs .LBB427_24
        add lr, r6, #4
        mov r7, #0
.LBB427_22:
        ldrb r6, [lr, r7]
        strb r6, [r11, r7]
        add r7, r7, #1
        cmp r4, r7
        bne .LBB427_22
        ldr lr, [sp, #4]
.LBB427_24:
        and r4, r5, #24
        lsl r7, lr, r4
        orr r7, r7, r8, lsr r9
        str r7, [r10]
        b .LBB427_8

compiler_builtins::mem::memmove:
        .fnstart
        .cfi_startproc
        .save   {r4, r5, r6, r7, r8, r9, r10, r11, lr}
        push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
        .cfi_def_cfa_offset 36
        .cfi_offset lr, -4
        .cfi_offset r11, -8
        .cfi_offset r10, -12
        .cfi_offset r9, -16
        .cfi_offset r8, -20
        .cfi_offset r7, -24
        .cfi_offset r6, -28
        .cfi_offset r5, -32
        .cfi_offset r4, -36
        .pad    #20
        sub sp, sp, #20
        .cfi_def_cfa_offset 56
        sub r3, r0, r1
        cmp r3, r2
        bhs .LBB227_13
        add r4, r1, r2
        add r10, r0, r2
        cmp r2, #16
        blo .LBB227_10
        and r12, r10, #3
        bic r3, r10, #3
        rsb r9, r12, #0
        cmp r3, r10
        bhs .LBB227_5
        add r7, r1, r2
        mov r5, r10
        sub r7, r7, #1
.LBB227_4:
        ldrb r6, [r7], #-1
        strb r6, [r5, #-1]!
        cmp r3, r5
        blo .LBB227_4
.LBB227_5:
        sub r12, r2, r12
        add r4, r4, r9
        bic r7, r12, #3
        ands r11, r4, #3
        sub r6, r3, r7
        rsb lr, r7, #0
        bne .LBB227_25
        cmp r6, r3
        bhs .LBB227_9
        add r1, r12, r1
        mov r2, r3
        sub r1, r1, #4
.LBB227_8:
        ldr r5, [r1], #-4
        str r5, [r2, #-4]!
        cmp r6, r2
        blo .LBB227_8
.LBB227_9:
        add r4, r4, lr
        add r10, r3, lr
        and r2, r12, #3
.LBB227_10:
        sub r1, r10, r2
        cmp r1, r10
        bhs .LBB227_24
        sub r2, r4, #1
.LBB227_12:
        ldrb r3, [r2], #-1
        strb r3, [r10, #-1]!
        cmp r1, r10
        blo .LBB227_12
        b .LBB227_24
.LBB227_13:
        cmp r2, #16
        blo .LBB227_22
        rsb r3, r0, #0
        and r12, r3, #3
        add r4, r0, r12
        cmp r0, r4
        bhs .LBB227_17
        mov r3, r12
        mov r6, r0
        mov r5, r1
.LBB227_16:
        ldrb r7, [r5], #1
        subs r3, r3, #1
        strb r7, [r6], #1
        bne .LBB227_16
.LBB227_17:
        sub r2, r2, r12
        add r12, r1, r12
        bic r5, r2, #3
        ands r7, r12, #3
        add r3, r4, r5
        bne .LBB227_33
        cmp r4, r3
        bhs .LBB227_21
        mov r1, r12
.LBB227_20:
        ldr r7, [r1], #4
        str r7, [r4], #4
        cmp r4, r3
        blo .LBB227_20
.LBB227_21:
        add r1, r12, r5
        and r2, r2, #3
        add r7, r3, r2
        cmp r3, r7
        blo .LBB227_23
        b .LBB227_24
.LBB227_22:
        mov r3, r0
        add r7, r3, r2
        cmp r3, r7
        bhs .LBB227_24
.LBB227_23:
        ldrb r7, [r1], #1
        subs r2, r2, #1
        strb r7, [r3], #1
        bne .LBB227_23
.LBB227_24:
        add sp, sp, #20
        pop {r4, r5, r6, r7, r8, r9, r10, r11, pc}
.LBB227_25:
        sub r7, r4, r11
        str r7, [sp, #8]
        add r7, sp, #16
        lsl r8, r11, #3
        orr r7, r7, r11
        str r8, [sp, #12]
        str r7, [sp, #4]
        mov r5, #0
        mov r8, r7
        add r7, sp, #16
        cmp r7, r8
        str r5, [sp, #16]
        bhs .LBB227_29
        ldr r5, [sp, #8]
        cmp r11, #1
        ldrb r5, [r5]
        strb r5, [sp, #16]
        beq .LBB227_28
        ldr r5, [sp, #8]
        cmp r11, #2
        ldrb r5, [r5, #1]
        strb r5, [sp, #17]
        ldrne r5, [sp, #8]
        ldrbne r5, [r5, #2]
        strbne r5, [sp, #18]
.LBB227_28:
        ldr r5, [sp, #16]
.LBB227_29:
        ldr r8, [sp, #12]
        add r6, r6, #4
        cmp r6, r3
        rsb r7, r8, #0
        bhs .LBB227_40
        sub r2, r2, r11
        str r7, [sp]
        add r8, r1, r2
        and r7, r7, #24
.LBB227_31:
        add r1, r8, r9
        ldr r2, [sp, #12]
        sub r8, r8, #4
        ldr r1, [r1, #-4]
        lsr r2, r1, r2
        orr r2, r2, r5, lsl r7
        add r5, r10, r9
        sub r10, r10, #4
        str r2, [r5, #-4]
        add r2, r10, r9
        cmp r6, r2
        mov r5, r1
        blo .LBB227_31
        add r5, r8, r9
        ldr r8, [sp, #12]
        ldr r7, [sp]
        str r5, [sp, #8]
        b .LBB227_41
.LBB227_33:
        add r6, sp, #16
        mov r8, #0
        orr r1, r6, r7
        add r10, r6, #4
        lsl r9, r7, #3
        cmp r1, r10
        str r8, [sp, #16]
        bhs .LBB227_37
        ldrb lr, [r12]
        mov r6, r1
        strb lr, [r6], #1
        cmp r6, r10
        beq .LBB227_36
        add lr, r1, #2
        ldrb r6, [r12, #1]
        cmp lr, r10
        strb r6, [r1, #1]
        ldrbne r6, [r12, #2]
        strbne r6, [lr]
.LBB227_36:
        ldr r8, [sp, #16]
.LBB227_37:
        str r1, [sp, #12]
        add r1, r4, #4
        cmp r1, r3
        rsb r1, r9, #0
        sub r6, r12, r7
        str r1, [sp, #8]
        bhs .LBB227_47
        and lr, r1, #24
.LBB227_39:
        ldr r10, [r6, #4]!
        add r11, r4, #4
        lsl r1, r10, lr
        orr r1, r1, r8, lsr r9
        str r1, [r4], #8
        cmp r4, r3
        mov r4, r11
        mov r8, r10
        blo .LBB227_39
        b .LBB227_48
.LBB227_40:
        mov r1, r5
        mov r2, r3
.LBB227_41:
        add r5, sp, #16
        mov r6, #0
        add r9, r5, #4
        ldr r5, [sp, #4]
        str r6, [sp, #16]
        cmp r5, r9
        bhs .LBB227_46
        ldr r6, [sp, #8]
        mov r10, r7
        add r6, r6, r11
        ldrb r7, [r6, #-4]!
        strb r7, [r5]
        add r7, r5, #1
        cmp r7, r9
        beq .LBB227_45
        ldrb r7, [r6, #1]
        strb r7, [r5, #1]
        add r7, r5, #2
        cmp r7, r9
        beq .LBB227_45
        ldrb r7, [r6, #2]
        strb r7, [r5, #2]
        add r7, r5, #3
        cmp r7, r9
        ldrbne r5, [r6, #3]
        strbne r5, [r7]
.LBB227_45:
        ldr r6, [sp, #16]
        mov r7, r10
.LBB227_46:
        lsr r5, r6, r8
        and r6, r7, #24
        orr r1, r5, r1, lsl r6
        str r1, [r2, #-4]
        b .LBB227_9
.LBB227_47:
        mov r10, r8
        mov r11, r4
.LBB227_48:
        ldr r1, [sp, #12]
        add lr, sp, #16
        mov r4, #0
        cmp lr, r1
        str r4, [sp, #16]
        bhs .LBB227_52
        add r6, r6, #4
.LBB227_50:
        ldrb r1, [r6, r4]
        strb r1, [lr, r4]
        add r4, r4, #1
        cmp r7, r4
        bne .LBB227_50
        ldr r4, [sp, #16]
.LBB227_52:
        ldr r1, [sp, #8]
        and r1, r1, #24
        lsl r1, r4, r1
        orr r1, r1, r10, lsr r9
        str r1, [r11]
        b .LBB227_21

Current master with #799

compiler_builtins::mem::memcpy:
        .fnstart
        .cfi_startproc
        .save   {r4, r5, r6, r7, r8, r9, r10, r11, lr}
        push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
        .cfi_def_cfa_offset 36
        .cfi_offset lr, -4
        .cfi_offset r11, -8
        .cfi_offset r10, -12
        .cfi_offset r9, -16
        .cfi_offset r8, -20
        .cfi_offset r7, -24
        .cfi_offset r6, -28
        .cfi_offset r5, -32
        .cfi_offset r4, -36
        .pad    #16
        sub sp, sp, #16
        .cfi_def_cfa_offset 52
        cmp r2, #16
        blo .LBB226_9
        rsb r3, r0, #0
        and r4, r3, #3
        add lr, r0, r4
        cmp r0, lr
        bhs .LBB226_4
        mov r3, r4
        mov r7, r0
        mov r6, r1
.LBB226_3:
        ldrb r5, [r6], #1
        subs r3, r3, #1
        strb r5, [r7], #1
        bne .LBB226_3
.LBB226_4:
        sub r12, r2, r4
        add r1, r1, r4
        bic r2, r12, #3
        ands r5, r1, #3
        add r3, lr, r2
        bne .LBB226_12
        cmp lr, r3
        bhs .LBB226_8
        mov r4, r1
.LBB226_7:
        ldr r5, [r4], #4
        str r5, [lr], #4
        cmp lr, r3
        blo .LBB226_7
.LBB226_8:
        add r1, r1, r2
        and r2, r12, #3
        add r7, r3, r2
        cmp r3, r7
        blo .LBB226_10
        b .LBB226_11
.LBB226_9:
        mov r3, r0
        add r7, r3, r2
        cmp r3, r7
        bhs .LBB226_11
.LBB226_10:
        ldrb r7, [r1], #1
        subs r2, r2, #1
        strb r7, [r3], #1
        bne .LBB226_10
.LBB226_11:
        add sp, sp, #16
        pop {r4, r5, r6, r7, r8, r9, r10, r11, pc}
.LBB226_12:
        add r4, sp, #12
        rsb r7, r5, #4
        mov r6, #0
        orr r4, r4, r5
        tst r7, #1
        str r6, [sp, #12]
        ldrbne r6, [r1]
        lsl r9, r5, #3
        strbne r6, [r4]
        movne r6, #1
        tst r7, #2
        addne r7, r1, r6
        addne r4, r4, r6
        sub r6, r1, r5
        ldrhne r7, [r7]
        strhne r7, [r4]
        add r4, lr, #4
        ldr r7, [sp, #12]
        cmp r4, r3
        rsb r4, r9, #0
        str r4, [sp]
        bhs .LBB226_15
        and r10, r4, #24
.LBB226_14:
        ldr r8, [r6, #4]!
        lsl r4, r8, r10
        orr r7, r4, r7, lsr r9
        add r4, lr, #4
        str r7, [lr], #8
        cmp lr, r3
        mov r7, r8
        mov lr, r4
        blo .LBB226_14
        b .LBB226_16
.LBB226_15:
        mov r8, r7
        mov r4, lr
.LBB226_16:
        mov lr, #0
        cmp r5, #1
        strb lr, [sp, #8]
        strb lr, [sp, #6]
        bne .LBB226_18
        add r11, sp, #8
        mov r5, #0
        mov r10, #0
        mov r7, #0
        b .LBB226_19
.LBB226_18:
        ldrb r7, [r6, #5]
        add r11, sp, #6
        ldrb r5, [r6, #4]
        strb r5, [sp, #8]
        lsl r10, r7, #8
        mov r7, #2
.LBB226_19:
        tst r1, #1
        beq .LBB226_21
        add r6, r6, #4
        ldrb r7, [r6, r7]
        strb r7, [r11]
        ldrb r7, [sp, #6]
        ldrb r5, [sp, #8]
        lsl lr, r7, #16
.LBB226_21:
        ldr r6, [sp]
        orr r7, r10, lr
        orr r7, r7, r5
        and r6, r6, #24
        lsl r7, r7, r6
        orr r7, r7, r8, lsr r9
        str r7, [r4]
        b .LBB226_8

compiler_builtins::mem::memmove:
        .fnstart
        .cfi_startproc
        .save   {r4, r5, r6, r7, r8, r9, r10, r11, lr}
        push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
        .cfi_def_cfa_offset 36
        .cfi_offset lr, -4
        .cfi_offset r11, -8
        .cfi_offset r10, -12
        .cfi_offset r9, -16
        .cfi_offset r8, -20
        .cfi_offset r7, -24
        .cfi_offset r6, -28
        .cfi_offset r5, -32
        .cfi_offset r4, -36
        .pad    #40
        sub sp, sp, #40
        .cfi_def_cfa_offset 76
        sub r3, r0, r1
        cmp r3, r2
        bhs .LBB227_13
        add r4, r1, r2
        add r8, r0, r2
        cmp r2, #16
        blo .LBB227_10
        and r12, r8, #3
        bic r3, r8, #3
        rsb r11, r12, #0
        cmp r3, r8
        bhs .LBB227_5
        add r7, r1, r2
        mov r5, r8
        sub r7, r7, #1
.LBB227_4:
        ldrb r6, [r7], #-1
        strb r6, [r5, #-1]!
        cmp r3, r5
        blo .LBB227_4
.LBB227_5:
        sub r12, r2, r12
        add r4, r4, r11
        bic r7, r12, #3
        ands r9, r4, #3
        sub r6, r3, r7
        rsb lr, r7, #0
        bne .LBB227_25
        cmp r6, r3
        bhs .LBB227_9
        add r1, r12, r1
        mov r2, r3
        sub r1, r1, #4
.LBB227_8:
        ldr r5, [r1], #-4
        str r5, [r2, #-4]!
        cmp r6, r2
        blo .LBB227_8
.LBB227_9:
        add r4, r4, lr
        add r8, r3, lr
        and r2, r12, #3
.LBB227_10:
        sub r1, r8, r2
        cmp r1, r8
        bhs .LBB227_24
        sub r2, r4, #1
.LBB227_12:
        ldrb r3, [r2], #-1
        strb r3, [r8, #-1]!
        cmp r1, r8
        blo .LBB227_12
        b .LBB227_24
.LBB227_13:
        cmp r2, #16
        blo .LBB227_22
        rsb r3, r0, #0
        and r12, r3, #3
        add r7, r0, r12
        cmp r0, r7
        bhs .LBB227_17
        mov r3, r12
        mov r5, r0
        mov r4, r1
.LBB227_16:
        ldrb r6, [r4], #1
        subs r3, r3, #1
        strb r6, [r5], #1
        bne .LBB227_16
.LBB227_17:
        sub r2, r2, r12
        add r11, r1, r12
        bic r5, r2, #3
        ands r6, r11, #3
        add r3, r7, r5
        bne .LBB227_27
        cmp r7, r3
        bhs .LBB227_21
        mov r1, r11
.LBB227_20:
        ldr r6, [r1], #4
        str r6, [r7], #4
        cmp r7, r3
        blo .LBB227_20
.LBB227_21:
        add r1, r11, r5
        and r2, r2, #3
        add r7, r3, r2
        cmp r3, r7
        blo .LBB227_23
        b .LBB227_24
.LBB227_22:
        mov r3, r0
        add r7, r3, r2
        cmp r3, r7
        bhs .LBB227_24
.LBB227_23:
        ldrb r7, [r1], #1
        subs r2, r2, #1
        strb r7, [r3], #1
        bne .LBB227_23
.LBB227_24:
        add sp, sp, #40
        pop {r4, r5, r6, r7, r8, r9, r10, r11, pc}
.LBB227_25:
        sub r5, r4, r9
        mov r10, #0
        lsl r7, r9, #3
        cmp r9, #1
        strb r10, [sp, #32]
        strb r10, [sp, #30]
        str r7, [sp, #12]
        str r5, [sp, #8]
        bne .LBB227_30
        mov r5, #0
        add r7, sp, #32
        str r5, [sp, #4]
        b .LBB227_40
.LBB227_27:
        add r1, sp, #24
        rsb r4, r6, #4
        orr r12, r1, r6
        tst r4, #1
        mov r8, #0
        ldrbne r1, [r11]
        str r8, [sp, #24]
        movne r8, #1
        strbne r1, [r12]
        tst r4, #2
        addne r1, r11, r8
        addne r4, r12, r8
        lsl lr, r6, #3
        sub r12, r11, r6
        ldrhne r1, [r1]
        strhne r1, [r4]
        add r1, r7, #4
        ldr r4, [sp, #24]
        cmp r1, r3
        rsb r1, lr, #0
        str r1, [sp, #12]
        bhs .LBB227_32
        and r10, r1, #24
.LBB227_29:
        ldr r8, [r12, #4]!
        add r9, r7, #4
        lsl r1, r8, r10
        orr r1, r1, r4, lsr lr
        str r1, [r7], #8
        cmp r7, r3
        mov r7, r9
        mov r4, r8
        blo .LBB227_29
        b .LBB227_33
.LBB227_30:
        ldrb r10, [r5]
        tst r4, #1
        ldrb r5, [r5, #1]
        str r5, [sp, #4]
        strb r10, [sp, #32]
        bne .LBB227_39
        mov r7, #0
        b .LBB227_41
.LBB227_32:
        mov r8, r4
        mov r9, r7
.LBB227_33:
        mov r7, #0
        cmp r6, #1
        strb r7, [sp, #20]
        strb r7, [sp, #18]
        bne .LBB227_35
        add r1, sp, #20
        mov r6, #0
        mov r10, #0
        mov r4, #0
        b .LBB227_36
.LBB227_35:
        ldrb r1, [r12, #5]
        mov r4, #2
        ldrb r6, [r12, #4]
        strb r6, [sp, #20]
        lsl r10, r1, #8
        add r1, sp, #18
.LBB227_36:
        tst r11, #1
        beq .LBB227_38
        add r7, r12, #4
        ldrb r7, [r7, r4]
        strb r7, [r1]
        ldrb r1, [sp, #18]
        ldrb r6, [sp, #20]
        lsl r7, r1, #16
.LBB227_38:
        orr r1, r10, r7
        ldr r7, [sp, #12]
        orr r1, r1, r6
        and r7, r7, #24
        lsl r1, r1, r7
        orr r1, r1, r8, lsr lr
        str r1, [r9]
        b .LBB227_21
.LBB227_39:
        add r7, sp, #30
        mov r10, #2
.LBB227_40:
        ldr r5, [sp, #8]
        ldrb r5, [r5, r10]
        strb r5, [r7]
        ldrb r7, [sp, #30]
        ldrb r10, [sp, #32]
        lsl r7, r7, #16
.LBB227_41:
        ldr r5, [sp, #4]
        orr r7, r7, r5, lsl #8
        add r5, r6, #4
        str r5, [sp]
        cmp r5, r3
        ldr r5, [sp, #12]
        orr r10, r7, r10
        rsb r7, r5, #0
        str r7, [sp, #4]
        bhs .LBB227_45
        sub r2, r2, r9
        ldr r5, [sp]
        add r6, r1, r2
        and r1, r7, #24
        str r1, [sp, #8]
.LBB227_43:
        add r1, r6, r11
        ldr r2, [sp, #12]
        ldr r7, [sp, #8]
        sub r6, r6, #4
        ldr r1, [r1, #-4]
        lsr r2, r1, r2
        orr r2, r2, r10, lsl r7
        add r7, r8, r11
        sub r8, r8, #4
        mov r10, r1
        str r2, [r7, #-4]
        add r2, r8, r11
        cmp r5, r2
        blo .LBB227_43
        add r7, r6, r11
        b .LBB227_46
.LBB227_45:
        mov r1, r10
        mov r2, r3
        ldr r7, [sp, #8]
.LBB227_46:
        add r5, sp, #36
        mov r6, #0
        orr r8, r5, r9
        add r5, r7, r9
        sub r7, r5, #4
        rsb r5, r9, #4
        str r6, [sp, #36]
        tst r5, #1
        ldrbne r6, [r7]
        strbne r6, [r8]
        movne r6, #1
        tst r5, #2
        addne r5, r7, r6
        addne r6, r8, r6
        ldrhne r5, [r5]
        strhne r5, [r6]
        ldr r5, [sp, #36]
        ldr r6, [sp, #12]
        lsr r5, r5, r6
        ldr r6, [sp, #4]
        and r6, r6, #24
        orr r1, r5, r1, lsl r6
        str r1, [r2, #-4]
        b .LBB227_9

Before #799

compiler_builtins::mem::memcpy:
        .fnstart
        .cfi_startproc
        .save   {r4, r5, r6, r7, r8, r9, r11, lr}
        push {r4, r5, r6, r7, r8, r9, r11, lr}
        .cfi_def_cfa_offset 32
        .cfi_offset lr, -4
        .cfi_offset r11, -8
        .cfi_offset r9, -12
        .cfi_offset r8, -16
        .cfi_offset r7, -20
        .cfi_offset r6, -24
        .cfi_offset r5, -28
        .cfi_offset r4, -32
        cmp r2, #16
        blo .LBB226_9
        rsb r3, r0, #0
        and r4, r3, #3
        add r12, r0, r4
        cmp r0, r12
        bhs .LBB226_4
        mov r3, r4
        mov r6, r0
        mov r5, r1
.LBB226_3:
        ldrb r7, [r5], #1
        subs r3, r3, #1
        strb r7, [r6], #1
        bne .LBB226_3
.LBB226_4:
        sub lr, r2, r4
        add r1, r1, r4
        bic r2, lr, #3
        tst r1, #3
        add r3, r12, r2
        bne .LBB226_12
        cmp r12, r3
        bhs .LBB226_8
        mov r4, r1
.LBB226_7:
        ldr r5, [r4], #4
        str r5, [r12], #4
        cmp r12, r3
        blo .LBB226_7
.LBB226_8:
        add r1, r1, r2
        and r2, lr, #3
        add r7, r3, r2
        cmp r3, r7
        blo .LBB226_10
        b .LBB226_11
.LBB226_9:
        mov r3, r0
        add r7, r3, r2
        cmp r3, r7
        bhs .LBB226_11
.LBB226_10:
        ldrb r7, [r1], #1
        subs r2, r2, #1
        strb r7, [r3], #1
        bne .LBB226_10
.LBB226_11:
        pop {r4, r5, r6, r7, r8, r9, r11, pc}
.LBB226_12:
        cmp r12, r3
        bhs .LBB226_8
        bic r7, r1, #3
        lsl r6, r1, #3
        add r5, r7, #4
        rsb r6, r6, #0
        ldr r7, [r7]
        mov r4, #24
        and r8, r4, r1, lsl #3
        and r9, r6, #24
.LBB226_14:
        ldr r4, [r5], #4
        lsl r6, r4, r9
        orr r6, r6, r7, lsr r8
        str r6, [r12], #4
        cmp r12, r3
        mov r7, r4
        blo .LBB226_14
        b .LBB226_8

compiler_builtins::mem::memmove:
        .fnstart
        .cfi_startproc
        .save   {r4, r5, r6, r7, r8, r9, r10, lr}
        push {r4, r5, r6, r7, r8, r9, r10, lr}
        .cfi_def_cfa_offset 32
        .cfi_offset lr, -4
        .cfi_offset r10, -8
        .cfi_offset r9, -12
        .cfi_offset r8, -16
        .cfi_offset r7, -20
        .cfi_offset r6, -24
        .cfi_offset r5, -28
        .cfi_offset r4, -32
        sub r3, r0, r1
        cmp r3, r2
        bhs .LBB227_13
        add r4, r1, r2
        add r3, r0, r2
        cmp r2, #16
        blo .LBB227_10
        and r5, r3, #3
        bic r12, r3, #3
        rsb r8, r5, #0
        cmp r12, r3
        bhs .LBB227_5
        add r7, r1, r2
        sub r7, r7, #1
.LBB227_4:
        ldrb r6, [r7], #-1
        strb r6, [r3, #-1]!
        cmp r12, r3
        blo .LBB227_4
.LBB227_5:
        sub r2, r2, r5
        add r4, r4, r8
        bic r7, r2, #3
        tst r4, #3
        sub r3, r12, r7
        rsb lr, r7, #0
        bne .LBB227_25
        cmp r3, r12
        bhs .LBB227_9
        add r1, r2, r1
        mov r5, r12
        sub r1, r1, #4
.LBB227_8:
        ldr r6, [r1], #-4
        str r6, [r5, #-4]!
        cmp r3, r5
        blo .LBB227_8
.LBB227_9:
        add r4, r4, lr
        add r3, r12, lr
        and r2, r2, #3
.LBB227_10:
        sub r1, r3, r2
        cmp r1, r3
        bhs .LBB227_24
        sub r2, r4, #1
.LBB227_12:
        ldrb r7, [r2], #-1
        strb r7, [r3, #-1]!
        cmp r1, r3
        blo .LBB227_12
        b .LBB227_24
.LBB227_13:
        cmp r2, #16
        blo .LBB227_22
        rsb r3, r0, #0
        and r5, r3, #3
        add r12, r0, r5
        cmp r0, r12
        bhs .LBB227_17
        mov r3, r5
        mov r7, r0
        mov r6, r1
.LBB227_16:
        ldrb r4, [r6], #1
        subs r3, r3, #1
        strb r4, [r7], #1
        bne .LBB227_16
.LBB227_17:
        sub r2, r2, r5
        add r5, r1, r5
        bic r4, r2, #3
        tst r5, #3
        add r3, r12, r4
        bne .LBB227_28
        cmp r12, r3
        bhs .LBB227_21
        mov r1, r5
.LBB227_20:
        ldr r7, [r1], #4
        str r7, [r12], #4
        cmp r12, r3
        blo .LBB227_20
.LBB227_21:
        add r1, r5, r4
        and r2, r2, #3
        add r7, r3, r2
        cmp r3, r7
        blo .LBB227_23
        b .LBB227_24
.LBB227_22:
        mov r3, r0
        add r7, r3, r2
        cmp r3, r7
        bhs .LBB227_24
.LBB227_23:
        ldrb r7, [r1], #1
        subs r2, r2, #1
        strb r7, [r3], #1
        bne .LBB227_23
.LBB227_24:
        pop {r4, r5, r6, r7, r8, r9, r10, pc}
.LBB227_25:
        cmp r3, r12
        bhs .LBB227_9
        mov r1, #24
        bic r6, r4, #3
        and r8, r1, r4, lsl #3
        lsl r1, r4, #3
        rsb r7, r1, #0
        ldr r1, [r6]
        and r9, r7, #24
        sub r5, r6, #4
        mov r7, r12
.LBB227_27:
        ldr r10, [r5], #-4
        lsr r6, r10, r8
        orr r1, r6, r1, lsl r9
        str r1, [r7, #-4]!
        mov r1, r10
        cmp r3, r7
        blo .LBB227_27
        b .LBB227_9
.LBB227_28:
        cmp r12, r3
        bhs .LBB227_21
        mov r1, #24
        bic r7, r5, #3
        and lr, r1, r5, lsl #3
        lsl r1, r5, #3
        add r6, r7, #4
        rsb r1, r1, #0
        ldr r7, [r7]
        and r8, r1, #24
.LBB227_30:
        ldr r9, [r6], #4
        lsl r1, r9, r8
        orr r1, r1, r7, lsr lr
        str r1, [r12], #4
        cmp r12, r3
        mov r7, r9
        blo .LBB227_30
        b .LBB227_21

Edit: IR for two of the versions is at https://gist.github.com/tgross35/95d2e6821db82a7a4b160a560c3281a5

RalfJung · 2025-03-22T17:38:26Z

Hm, bummer.

Any other ideas for what we could do to make this faster without causing UB?

RalfJung · 2025-03-22T18:24:54Z

I have one idea: use inline assembly to do the partial loads, at least on a few targets we primarily care about.

However, that's beyond my inline asm skill level. ;) Maybe @beetrees wants to give it a shot?

Also, kind of the entire point of this codepath was to be portable, so it's a bit of a bummer. But at least most of the logic would still be portable...

tgross35 · 2025-03-22T19:54:29Z

I feel like there has to be a way to hint LLVM better, or else there is missed optimization. Asked at https://rust-lang.zulipchat.com/#narrow/channel/187780-t-compiler.2Fwg-llvm/topic/Hinting.20to.20LLVM.20that.20it.20can.20read.20uninitialized.20data/with/507486212

beetrees · 2025-03-22T22:09:08Z

Maybe @beetrees wants to give it a shot?

This should cover all tier 1 targets:

/// `addr` must be aligned to `align_of::<usize>()`.
pub unsafe fn load(addr: *mut usize) -> usize {
    let mut out;
    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
    unsafe {
        core::arch::asm!("mov {out}, [{addr}]", addr = in(reg) addr, out = lateout(reg) out, options(nostack, readonly, preserves_flags));
    }
    #[cfg(any(target_arch = "arm", target_arch = "aarch64", target_arch = "arm64ec"))]
    unsafe {
        core::arch::asm!("ldr {out}, [{addr}]", addr = in(reg) addr, out = lateout(reg) out, options(nostack, readonly, preserves_flags));
    }
    out
}

RalfJung · 2025-03-23T10:01:59Z

Thanks! I've applied the ARM branch in this PR. x86 uses the mem-unaligned feature so it is not running any of this code anyway.

I've kept the simplified fallback implementation using copy_forward_bytes. I suppose we'll add inline assembly for any architecture we actually care about.

@tgross35 should be ready for the next round of benchmarks. :)

…mentation

tgross35 · 2025-03-24T02:16:57Z

We are just not having any luck; this didn't recover any of the memcpy performance and it further regressed memmove. The two tests I mentioned in #799 (comment), comparing 4df7a8d to 1fd36a5 (this PR's diff):

mem_icount::memcpy::bench misaligned_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 66 })
bytes: 1048576 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                     2359617|2359657              (-0.00170%) [-1.00002x]
  L1 Hits:                          2851259|2851307              (-0.00168%) [-1.00002x]
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                              11|13                   (-15.3846%) [-1.18182x]
  Total read+write:                 2884052|2884102              (-0.00173%) [-1.00002x]
  Estimated Cycles:                 3015554|3015672              (-0.00391%) [-1.00004x]

mem_icount::memmove::forward large_spread_5:setup_forward(Cfg { len : MEG1, spread : Large, of...
bytes: 1048576, spread: 1048575, offset: 0, forward
- end of stdout/stderr
  Instructions:                     4194512|3932410              (+6.66517%) [+1.06665x]
  L1 Hits:                          5210648|5210698              (-0.00096%) [-1.00001x]
  L2 Hits:                            32523|32523                (No change)
  RAM Hits:                              16|19                   (-15.7895%) [-1.18750x]
  Total read+write:                 5243187|5243240              (-0.00101%) [-1.00001x]
  Estimated Cycles:                 5373823|5373978              (-0.00288%) [-1.00003x]

Full run

     Running benches/mem_icount.rs (target/release/deps/mem_icount-f7ca6dbcc87f37e2)
mem_icount::memcpy::bench aligned_0:setup(Cfg { len : 16, s_off : 0, d_off : 0 })
bytes: 16 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         517|519                  (-0.38536%) [-1.00387x]
  L1 Hits:                              775|779                  (-0.51348%) [-1.00516x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     780|784                  (-0.51020%) [-1.00513x]
  Estimated Cycles:                     950|954                  (-0.41929%) [-1.00421x]
mem_icount::memcpy::bench aligned_1:setup(Cfg { len : 32, s_off : 0, d_off : 0 })
bytes: 32 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         533|535                  (-0.37383%) [-1.00375x]
  L1 Hits:                              799|803                  (-0.49813%) [-1.00501x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     804|808                  (-0.49505%) [-1.00498x]
  Estimated Cycles:                     974|978                  (-0.40900%) [-1.00411x]
mem_icount::memcpy::bench aligned_2:setup(Cfg { len : 64, s_off : 0, d_off : 0 })
bytes: 64 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         565|567                  (-0.35273%) [-1.00354x]
  L1 Hits:                              847|851                  (-0.47004%) [-1.00472x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     852|856                  (-0.46729%) [-1.00469x]
  Estimated Cycles:                    1022|1026                 (-0.38986%) [-1.00391x]
mem_icount::memcpy::bench aligned_3:setup(Cfg { len : 512, s_off : 0, d_off : 0 })
bytes: 512 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                        1013|1015                 (-0.19704%) [-1.00197x]
  L1 Hits:                             1519|1523                 (-0.26264%) [-1.00263x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1524|1528                 (-0.26178%) [-1.00262x]
  Estimated Cycles:                    1694|1698                 (-0.23557%) [-1.00236x]
mem_icount::memcpy::bench aligned_4:setup(Cfg { len : 4096, s_off : 0, d_off : 0 })
bytes: 4096 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                        4597|4599                 (-0.04349%) [-1.00044x]
  L1 Hits:                             6895|6899                 (-0.05798%) [-1.00058x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    6900|6904                 (-0.05794%) [-1.00058x]
  Estimated Cycles:                    7070|7074                 (-0.05655%) [-1.00057x]
mem_icount::memcpy::bench aligned_5:setup(Cfg { len : MEG1, s_off : 0, d_off : 0 })
bytes: 1048576 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                     1048884|1048886              (-0.00019%) [-1.00000x]
  L1 Hits:                          1540523|1540527              (-0.00026%) [-1.00000x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 1573312|1573316              (-0.00025%) [-1.00000x]
  Estimated Cycles:                 1704738|1704742              (-0.00023%) [-1.00000x]
mem_icount::memcpy::bench offset_0:setup(Cfg { len : 16, s_off : 65, d_off : 65 })
bytes: 16 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         531|533                  (-0.37523%) [-1.00377x]
  L1 Hits:                              795|799                  (-0.50063%) [-1.00503x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     800|804                  (-0.49751%) [-1.00500x]
  Estimated Cycles:                     970|974                  (-0.41068%) [-1.00412x]
mem_icount::memcpy::bench offset_1:setup(Cfg { len : 32, s_off : 65, d_off : 65 })
bytes: 32 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         547|549                  (-0.36430%) [-1.00366x]
  L1 Hits:                              819|823                  (-0.48603%) [-1.00488x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     824|828                  (-0.48309%) [-1.00485x]
  Estimated Cycles:                     994|998                  (-0.40080%) [-1.00402x]
mem_icount::memcpy::bench offset_2:setup(Cfg { len : 64, s_off : 65, d_off : 65 })
bytes: 64 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         579|581                  (-0.34423%) [-1.00345x]
  L1 Hits:                              867|871                  (-0.45924%) [-1.00461x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     872|876                  (-0.45662%) [-1.00459x]
  Estimated Cycles:                    1042|1046                 (-0.38241%) [-1.00384x]
mem_icount::memcpy::bench offset_3:setup(Cfg { len : 512, s_off : 65, d_off : 65 })
bytes: 512 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                        1027|1029                 (-0.19436%) [-1.00195x]
  L1 Hits:                             1539|1543                 (-0.25924%) [-1.00260x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1544|1548                 (-0.25840%) [-1.00259x]
  Estimated Cycles:                    1714|1718                 (-0.23283%) [-1.00233x]
mem_icount::memcpy::bench offset_4:setup(Cfg { len : 4096, s_off : 65, d_off : 65 })
bytes: 4096 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                        4611|4613                 (-0.04336%) [-1.00043x]
  L1 Hits:                             6915|6919                 (-0.05781%) [-1.00058x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    6920|6924                 (-0.05777%) [-1.00058x]
  Estimated Cycles:                    7090|7094                 (-0.05639%) [-1.00056x]
mem_icount::memcpy::bench offset_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 65 })
bytes: 1048576 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                     1048898|1048900              (-0.00019%) [-1.00000x]
  L1 Hits:                          1540541|1540545              (-0.00026%) [-1.00000x]
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 1573332|1573336              (-0.00025%) [-1.00000x]
  Estimated Cycles:                 1704766|1704770              (-0.00023%) [-1.00000x]
mem_icount::memcpy::bench misaligned_0:setup(Cfg { len : 16, s_off : 65, d_off : 66 })
bytes: 16 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         550|590                  (-6.77966%) [-1.07273x]
  L1 Hits:                              813|861                  (-5.57491%) [-1.05904x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               7|9                    (-22.2222%) [-1.28571x]
  Total read+write:                     820|870                  (-5.74713%) [-1.06098x]
  Estimated Cycles:                    1058|1176                 (-10.0340%) [-1.11153x]
mem_icount::memcpy::bench misaligned_1:setup(Cfg { len : 32, s_off : 65, d_off : 66 })
bytes: 32 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         586|626                  (-6.38978%) [-1.06826x]
  L1 Hits:                              857|905                  (-5.30387%) [-1.05601x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               7|9                    (-22.2222%) [-1.28571x]
  Total read+write:                     864|914                  (-5.47046%) [-1.05787x]
  Estimated Cycles:                    1102|1220                 (-9.67213%) [-1.10708x]
mem_icount::memcpy::bench misaligned_2:setup(Cfg { len : 64, s_off : 65, d_off : 66 })
bytes: 64 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         658|698                  (-5.73066%) [-1.06079x]
  L1 Hits:                              945|993                  (-4.83384%) [-1.05079x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               7|9                    (-22.2222%) [-1.28571x]
  Total read+write:                     952|1002                 (-4.99002%) [-1.05252x]
  Estimated Cycles:                    1190|1308                 (-9.02141%) [-1.09916x]
mem_icount::memcpy::bench misaligned_3:setup(Cfg { len : 512, s_off : 65, d_off : 66 })
bytes: 512 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                        1666|1706                 (-2.34467%) [-1.02401x]
  L1 Hits:                             2177|2225                 (-2.15730%) [-1.02205x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               7|9                    (-22.2222%) [-1.28571x]
  Total read+write:                    2184|2234                 (-2.23814%) [-1.02289x]
  Estimated Cycles:                    2422|2540                 (-4.64567%) [-1.04872x]
mem_icount::memcpy::bench misaligned_4:setup(Cfg { len : 4096, s_off : 65, d_off : 66 })
bytes: 4096 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                        9730|9770                 (-0.40942%) [-1.00411x]
  L1 Hits:                            12033|12081                (-0.39732%) [-1.00399x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               7|9                    (-22.2222%) [-1.28571x]
  Total read+write:                   12040|12090                (-0.41356%) [-1.00415x]
  Estimated Cycles:                   12278|12396                (-0.95192%) [-1.00961x]
mem_icount::memcpy::bench misaligned_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 66 })
bytes: 1048576 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                     2359617|2359657              (-0.00170%) [-1.00002x]
  L1 Hits:                          2851259|2851307              (-0.00168%) [-1.00002x]
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                              11|13                   (-15.3846%) [-1.18182x]
  Total read+write:                 2884052|2884102              (-0.00173%) [-1.00002x]
  Estimated Cycles:                 3015554|3015672              (-0.00391%) [-1.00004x]
mem_icount::memset::bench aligned_0:setup(Cfg { len : 16, offset : 0 })
bytes: 16, offset: 0
- end of stdout/stderr
  Instructions:                         288|288                  (No change)
  L1 Hits:                              418|418                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     427|427                  (No change)
  Estimated Cycles:                     673|673                  (No change)
mem_icount::memset::bench aligned_1:setup(Cfg { len : 32, offset : 0 })
bytes: 32, offset: 0
- end of stdout/stderr
  Instructions:                         300|300                  (No change)
  L1 Hits:                              434|434                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     443|443                  (No change)
  Estimated Cycles:                     689|689                  (No change)
mem_icount::memset::bench aligned_2:setup(Cfg { len : 64, offset : 0 })
bytes: 64, offset: 0
- end of stdout/stderr
  Instructions:                         324|324                  (No change)
  L1 Hits:                              466|466                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     475|475                  (No change)
  Estimated Cycles:                     721|721                  (No change)
mem_icount::memset::bench aligned_3:setup(Cfg { len : 512, offset : 0 })
bytes: 512, offset: 0
- end of stdout/stderr
  Instructions:                         660|660                  (No change)
  L1 Hits:                              914|914                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     923|923                  (No change)
  Estimated Cycles:                    1169|1169                 (No change)
mem_icount::memset::bench aligned_4:setup(Cfg { len : 4096, offset : 0 })
bytes: 4096, offset: 0
- end of stdout/stderr
  Instructions:                        3348|3348                 (No change)
  L1 Hits:                             4498|4498                 (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                    4507|4507                 (No change)
  Estimated Cycles:                    4753|4753                 (No change)
mem_icount::memset::bench aligned_5:setup(Cfg { len : MEG1, offset : 0 })
bytes: 1048576, offset: 0
- end of stdout/stderr
  Instructions:                      786614|786614               (No change)
  L1 Hits:                          1032429|1032429              (No change)
  L2 Hits:                            16395|16395                (No change)
  RAM Hits:                              10|10                   (No change)
  Total read+write:                 1048834|1048834              (No change)
  Estimated Cycles:                 1114754|1114754              (No change)
mem_icount::memset::bench offset_0:setup(Cfg { len : 16, offset : 65 })
bytes: 16, offset: 65
- end of stdout/stderr
  Instructions:                         300|300                  (No change)
  L1 Hits:                              433|433                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     442|442                  (No change)
  Estimated Cycles:                     688|688                  (No change)
mem_icount::memset::bench offset_1:setup(Cfg { len : 32, offset : 65 })
bytes: 32, offset: 65
- end of stdout/stderr
  Instructions:                         312|312                  (No change)
  L1 Hits:                              449|449                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     458|458                  (No change)
  Estimated Cycles:                     704|704                  (No change)
mem_icount::memset::bench offset_2:setup(Cfg { len : 64, offset : 65 })
bytes: 64, offset: 65
- end of stdout/stderr
  Instructions:                         336|336                  (No change)
  L1 Hits:                              481|481                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     490|490                  (No change)
  Estimated Cycles:                     736|736                  (No change)
mem_icount::memset::bench offset_3:setup(Cfg { len : 512, offset : 65 })
bytes: 512, offset: 65
- end of stdout/stderr
  Instructions:                         672|672                  (No change)
  L1 Hits:                              929|929                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     938|938                  (No change)
  Estimated Cycles:                    1184|1184                 (No change)
mem_icount::memset::bench offset_4:setup(Cfg { len : 4096, offset : 65 })
bytes: 4096, offset: 65
- end of stdout/stderr
  Instructions:                        3360|3360                 (No change)
  L1 Hits:                             4513|4513                 (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                    4522|4522                 (No change)
  Estimated Cycles:                    4768|4768                 (No change)
mem_icount::memset::bench offset_5:setup(Cfg { len : MEG1, offset : 65 })
bytes: 1048576, offset: 65
- end of stdout/stderr
  Instructions:                      786626|786626               (No change)
  L1 Hits:                          1032443|1032443              (No change)
  L2 Hits:                            16396|16396                (No change)
  RAM Hits:                              10|10                   (No change)
  Total read+write:                 1048849|1048849              (No change)
  Estimated Cycles:                 1114773|1114773              (No change)
mem_icount::memcmp::bench aligned_0:setup(Cfg { len : 16, s_off : 0, d_off : 0 })
bytes: 16, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         579|579                  (No change)
  L1 Hits:                              850|850                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|4                    (No change)
  Total read+write:                     854|854                  (No change)
  Estimated Cycles:                     990|990                  (No change)
mem_icount::memcmp::bench aligned_1:setup(Cfg { len : 32, s_off : 0, d_off : 0 })
bytes: 32, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         675|675                  (No change)
  L1 Hits:                              977|977                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     982|982                  (No change)
  Estimated Cycles:                    1152|1152                 (No change)
mem_icount::memcmp::bench aligned_2:setup(Cfg { len : 64, s_off : 0, d_off : 0 })
bytes: 64, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         867|867                  (No change)
  L1 Hits:                             1233|1233                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1238|1238                 (No change)
  Estimated Cycles:                    1408|1408                 (No change)
mem_icount::memcmp::bench aligned_3:setup(Cfg { len : 512, s_off : 0, d_off : 0 })
bytes: 512, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                        3555|3555                 (No change)
  L1 Hits:                             4817|4817                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    4822|4822                 (No change)
  Estimated Cycles:                    4992|4992                 (No change)
mem_icount::memcmp::bench aligned_4:setup(Cfg { len : 4096, s_off : 0, d_off : 0 })
bytes: 4096, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                       25059|25059                (No change)
  L1 Hits:                            33489|33489                (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                   33494|33494                (No change)
  Estimated Cycles:                   33664|33664                (No change)
mem_icount::memcmp::bench aligned_5:setup(Cfg { len : MEG1, s_off : 0, d_off : 0 })
bytes: 1048576, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                     6291746|6291752              (-0.00010%) [-1.00000x]
  L1 Hits:                          8356237|8356245              (-0.00010%) [-1.00000x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 8389026|8389034              (-0.00010%) [-1.00000x]
  Estimated Cycles:                 8520452|8520460              (-0.00009%) [-1.00000x]
mem_icount::memcmp::bench offset_0:setup(Cfg { len : 16, s_off : 65, d_off : 65 })
bytes: 16, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         579|579                  (No change)
  L1 Hits:                              849|849                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     854|854                  (No change)
  Estimated Cycles:                    1024|1024                 (No change)
mem_icount::memcmp::bench offset_1:setup(Cfg { len : 32, s_off : 65, d_off : 65 })
bytes: 32, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         675|675                  (No change)
  L1 Hits:                              977|977                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     982|982                  (No change)
  Estimated Cycles:                    1152|1152                 (No change)
mem_icount::memcmp::bench offset_2:setup(Cfg { len : 64, s_off : 65, d_off : 65 })
bytes: 64, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         867|867                  (No change)
  L1 Hits:                             1233|1233                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1238|1238                 (No change)
  Estimated Cycles:                    1408|1408                 (No change)
mem_icount::memcmp::bench offset_3:setup(Cfg { len : 512, s_off : 65, d_off : 65 })
bytes: 512, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                        3555|3555                 (No change)
  L1 Hits:                             4817|4817                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    4822|4822                 (No change)
  Estimated Cycles:                    4992|4992                 (No change)
mem_icount::memcmp::bench offset_4:setup(Cfg { len : 4096, s_off : 65, d_off : 65 })
bytes: 4096, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                       25059|25059                (No change)
  L1 Hits:                            33489|33489                (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                   33494|33494                (No change)
  Estimated Cycles:                   33664|33664                (No change)
mem_icount::memcmp::bench offset_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 65 })
bytes: 1048576, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                     6291746|6291746              (No change)
  L1 Hits:                          8356235|8356235              (No change)
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 8389026|8389026              (No change)
  Estimated Cycles:                 8520460|8520460              (No change)
mem_icount::memcmp::bench misaligned_0:setup(Cfg { len : 16, s_off : 65, d_off : 66 })
bytes: 16, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         579|579                  (No change)
  L1 Hits:                              849|849                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     854|854                  (No change)
  Estimated Cycles:                    1024|1024                 (No change)
mem_icount::memcmp::bench misaligned_1:setup(Cfg { len : 32, s_off : 65, d_off : 66 })
bytes: 32, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         675|675                  (No change)
  L1 Hits:                              977|977                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     982|982                  (No change)
  Estimated Cycles:                    1152|1152                 (No change)
mem_icount::memcmp::bench misaligned_2:setup(Cfg { len : 64, s_off : 65, d_off : 66 })
bytes: 64, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         867|867                  (No change)
  L1 Hits:                             1233|1233                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1238|1238                 (No change)
  Estimated Cycles:                    1408|1408                 (No change)
mem_icount::memcmp::bench misaligned_3:setup(Cfg { len : 512, s_off : 65, d_off : 66 })
bytes: 512, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                        3555|3555                 (No change)
  L1 Hits:                             4817|4817                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    4822|4822                 (No change)
  Estimated Cycles:                    4992|4992                 (No change)
mem_icount::memcmp::bench misaligned_4:setup(Cfg { len : 4096, s_off : 65, d_off : 66 })
bytes: 4096, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                       25059|25059                (No change)
  L1 Hits:                            33489|33489                (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                   33494|33494                (No change)
  Estimated Cycles:                   33664|33664                (No change)
mem_icount::memcmp::bench misaligned_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 66 })
bytes: 1048576, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                     6291746|6291746              (No change)
  L1 Hits:                          8356235|8356235              (No change)
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 8389026|8389026              (No change)
  Estimated Cycles:                 8520460|8520460              (No change)
mem_icount::memmove::forward aligned_0:setup_forward(Cfg { len : 4096, spread : Aligned, ...
bytes: 4096, spread: 512, offset: 0, forward
- end of stdout/stderr
  Instructions:                        4391|4391                 (No change)
  L1 Hits:                             6580|6579                 (+0.01520%) [+1.00015x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               8|9                    (-11.1111%) [-1.12500x]
  Total read+write:                    6590|6590                 (No change)
  Estimated Cycles:                    6870|6904                 (-0.49247%) [-1.00495x]
mem_icount::memmove::forward aligned_1:setup_forward(Cfg { len : MEG1, spread : Aligned, ...
bytes: 1048576, spread: 512, offset: 0, forward
- end of stdout/stderr
  Instructions:                     1048777|1048777              (No change)
  L1 Hits:                          1557244|1557243              (+0.00006%) [+1.00000x]
  L2 Hits:                            15902|15902                (No change)
  RAM Hits:                              11|12                   (-8.33333%) [-1.09091x]
  Total read+write:                 1573157|1573157              (No change)
  Estimated Cycles:                 1637139|1637173              (-0.00208%) [-1.00002x]
mem_icount::memmove::forward small_spread_0:setup_forward(Cfg { len : 16, spread : Small, off ...
bytes: 16, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                         366|406                  (-9.85222%) [-1.10929x]
  L1 Hits:                              525|578                  (-9.16955%) [-1.10095x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                     540|598                  (-9.69900%) [-1.10741x]
  Estimated Cycles:                     990|1218                 (-18.7192%) [-1.23030x]
mem_icount::memmove::forward small_spread_1:setup_forward(Cfg { len : 32, spread : Small, off ...
bytes: 32, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                         430|466                  (-7.72532%) [-1.08372x]
  L1 Hits:                              605|658                  (-8.05471%) [-1.08760x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                     620|678                  (-8.55457%) [-1.09355x]
  Estimated Cycles:                    1070|1298                 (-17.5655%) [-1.21308x]
mem_icount::memmove::forward small_spread_2:setup_forward(Cfg { len : 64, spread : Small, off ...
bytes: 64, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                         558|586                  (-4.77816%) [-1.05018x]
  L1 Hits:                              765|818                  (-6.47922%) [-1.06928x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                     780|838                  (-6.92124%) [-1.07436x]
  Estimated Cycles:                    1230|1458                 (-15.6379%) [-1.18537x]
mem_icount::memmove::forward small_spread_3:setup_forward(Cfg { len : 512, spread : Small, off...
bytes: 512, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                        2350|2266                 (+3.70697%) [+1.03707x]
  L1 Hits:                             3005|3058                 (-1.73316%) [-1.01764x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                    3020|3078                 (-1.88434%) [-1.01921x]
  Estimated Cycles:                    3470|3698                 (-6.16549%) [-1.06571x]
mem_icount::memmove::forward small_spread_4:setup_forward(Cfg { len : 4096, spread : Small, of...
bytes: 4096, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                       16686|15706                (+6.23965%) [+1.06240x]
  L1 Hits:                            20925|20978                (-0.25265%) [-1.00253x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                   20940|20998                (-0.27622%) [-1.00277x]
  Estimated Cycles:                   21390|21618                (-1.05468%) [-1.01066x]
mem_icount::memmove::forward small_spread_5:setup_forward(Cfg { len : MEG1, spread : Small, of...
bytes: 1048576, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                     4194512|3932412              (+6.66512%) [+1.06665x]
  L1 Hits:                          5227284|5227337              (-0.00101%) [-1.00001x]
  L2 Hits:                            15887|15887                (No change)
  RAM Hits:                              16|21                   (-23.8095%) [-1.31250x]
  Total read+write:                 5243187|5243245              (-0.00111%) [-1.00001x]
  Estimated Cycles:                 5307279|5307507              (-0.00430%) [-1.00004x]
mem_icount::memmove::forward medium_spread_0:setup_forward(Cfg { len : 16, spread : Medium, off...
bytes: 16, spread: 9, offset: 0, forward
- end of stdout/stderr
  Instructions:                         366|406                  (-9.85222%) [-1.10929x]
  L1 Hits:                              525|578                  (-9.16955%) [-1.10095x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                     540|598                  (-9.69900%) [-1.10741x]
  Estimated Cycles:                     990|1218                 (-18.7192%) [-1.23030x]
mem_icount::memmove::forward medium_spread_1:setup_forward(Cfg { len : 32, spread : Medium, off...
bytes: 32, spread: 17, offset: 0, forward
- end of stdout/stderr
  Instructions:                         430|466                  (-7.72532%) [-1.08372x]
  L1 Hits:                              605|658                  (-8.05471%) [-1.08760x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                     620|678                  (-8.55457%) [-1.09355x]
  Estimated Cycles:                    1070|1298                 (-17.5655%) [-1.21308x]
mem_icount::memmove::forward medium_spread_2:setup_forward(Cfg { len : 64, spread : Medium, off...
bytes: 64, spread: 33, offset: 0, forward
- end of stdout/stderr
  Instructions:                         558|586                  (-4.77816%) [-1.05018x]
  L1 Hits:                              765|818                  (-6.47922%) [-1.06928x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                     780|838                  (-6.92124%) [-1.07436x]
  Estimated Cycles:                    1230|1458                 (-15.6379%) [-1.18537x]
mem_icount::memmove::forward medium_spread_3:setup_forward(Cfg { len : 512, spread : Medium, of...
bytes: 512, spread: 257, offset: 0, forward
- end of stdout/stderr
  Instructions:                        2350|2266                 (+3.70697%) [+1.03707x]
  L1 Hits:                             3005|3058                 (-1.73316%) [-1.01764x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                    3020|3078                 (-1.88434%) [-1.01921x]
  Estimated Cycles:                    3470|3698                 (-6.16549%) [-1.06571x]
mem_icount::memmove::forward medium_spread_4:setup_forward(Cfg { len : 4096, spread : Medium, o...
bytes: 4096, spread: 2049, offset: 0, forward
- end of stdout/stderr
  Instructions:                       16686|15706                (+6.23965%) [+1.06240x]
  L1 Hits:                            20925|20978                (-0.25265%) [-1.00253x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                   20940|20998                (-0.27622%) [-1.00277x]
  Estimated Cycles:                   21390|21618                (-1.05468%) [-1.01066x]
mem_icount::memmove::forward medium_spread_5:setup_forward(Cfg { len : MEG1, spread : Medium, o...
bytes: 1048576, spread: 524289, offset: 0, forward
- end of stdout/stderr
  Instructions:                     4194512|3932412              (+6.66512%) [+1.06665x]
  L1 Hits:                          5210648|5210701              (-0.00102%) [-1.00001x]
  L2 Hits:                            32523|32523                (No change)
  RAM Hits:                              16|21                   (-23.8095%) [-1.31250x]
  Total read+write:                 5243187|5243245              (-0.00111%) [-1.00001x]
  Estimated Cycles:                 5373823|5374051              (-0.00424%) [-1.00004x]
mem_icount::memmove::forward large_spread_0:setup_forward(Cfg { len : 16, spread : Large, off ...
bytes: 16, spread: 15, offset: 0, forward
- end of stdout/stderr
  Instructions:                         366|404                  (-9.40594%) [-1.10383x]
  L1 Hits:                              525|575                  (-8.69565%) [-1.09524x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                     540|593                  (-8.93761%) [-1.09815x]
  Estimated Cycles:                     990|1145                 (-13.5371%) [-1.15657x]
mem_icount::memmove::forward large_spread_1:setup_forward(Cfg { len : 32, spread : Large, off ...
bytes: 32, spread: 31, offset: 0, forward
- end of stdout/stderr
  Instructions:                         430|464                  (-7.32759%) [-1.07907x]
  L1 Hits:                              605|655                  (-7.63359%) [-1.08264x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                     620|673                  (-7.87519%) [-1.08548x]
  Estimated Cycles:                    1070|1225                 (-12.6531%) [-1.14486x]
mem_icount::memmove::forward large_spread_2:setup_forward(Cfg { len : 64, spread : Large, off ...
bytes: 64, spread: 63, offset: 0, forward
- end of stdout/stderr
  Instructions:                         558|584                  (-4.45205%) [-1.04659x]
  L1 Hits:                              765|815                  (-6.13497%) [-1.06536x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                     780|833                  (-6.36255%) [-1.06795x]
  Estimated Cycles:                    1230|1385                 (-11.1913%) [-1.12602x]
mem_icount::memmove::forward large_spread_3:setup_forward(Cfg { len : 512, spread : Large, off...
bytes: 512, spread: 511, offset: 0, forward
- end of stdout/stderr
  Instructions:                        2350|2264                 (+3.79859%) [+1.03799x]
  L1 Hits:                             3005|3055                 (-1.63666%) [-1.01664x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                    3020|3073                 (-1.72470%) [-1.01755x]
  Estimated Cycles:                    3470|3625                 (-4.27586%) [-1.04467x]
mem_icount::memmove::forward large_spread_4:setup_forward(Cfg { len : 4096, spread : Large, of...
bytes: 4096, spread: 4095, offset: 0, forward
- end of stdout/stderr
  Instructions:                       16686|15704                (+6.25318%) [+1.06253x]
  L1 Hits:                            20925|20975                (-0.23838%) [-1.00239x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                   20940|20993                (-0.25247%) [-1.00253x]
  Estimated Cycles:                   21390|21545                (-0.71942%) [-1.00725x]
mem_icount::memmove::forward large_spread_5:setup_forward(Cfg { len : MEG1, spread : Large, of...
bytes: 1048576, spread: 1048575, offset: 0, forward
- end of stdout/stderr
  Instructions:                     4194512|3932410              (+6.66517%) [+1.06665x]
  L1 Hits:                          5210648|5210698              (-0.00096%) [-1.00001x]
  L2 Hits:                            32523|32523                (No change)
  RAM Hits:                              16|19                   (-15.7895%) [-1.18750x]
  Total read+write:                 5243187|5243240              (-0.00101%) [-1.00001x]
  Estimated Cycles:                 5373823|5373978              (-0.00288%) [-1.00003x]
mem_icount::memmove::forward aligned_off_0:setup_forward(Cfg { len : 4096, spread : Aligned, ...
bytes: 4096, spread: 512, offset: 65, forward
- end of stdout/stderr
  Instructions:                        4407|4407                 (No change)
  L1 Hits:                             6601|6600                 (+0.01515%) [+1.00015x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               9|10                   (-10.0000%) [-1.11111x]
  Total read+write:                    6612|6612                 (No change)
  Estimated Cycles:                    6926|6960                 (-0.48851%) [-1.00491x]
mem_icount::memmove::forward aligned_off_1:setup_forward(Cfg { len : MEG1, spread : Aligned, ...
bytes: 1048576, spread: 512, offset: 65, forward
- end of stdout/stderr
  Instructions:                     1048793|1048793              (No change)
  L1 Hits:                          1557264|1557263              (+0.00006%) [+1.00000x]
  L2 Hits:                            15903|15903                (No change)
  RAM Hits:                              12|13                   (-7.69231%) [-1.08333x]
  Total read+write:                 1573179|1573179              (No change)
  Estimated Cycles:                 1637199|1637233              (-0.00208%) [-1.00002x]
mem_icount::memmove::forward small_spread_off_0:setup_forward(Cfg { len : 16, spread : Small, off ...
bytes: 16, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                         366|406                  (-9.85222%) [-1.10929x]
  L1 Hits:                              525|578                  (-9.16955%) [-1.10095x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                     540|598                  (-9.69900%) [-1.10741x]
  Estimated Cycles:                     990|1218                 (-18.7192%) [-1.23030x]
mem_icount::memmove::forward small_spread_off_1:setup_forward(Cfg { len : 32, spread : Small, off ...
bytes: 32, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                         430|466                  (-7.72532%) [-1.08372x]
  L1 Hits:                              605|658                  (-8.05471%) [-1.08760x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                     620|678                  (-8.55457%) [-1.09355x]
  Estimated Cycles:                    1070|1298                 (-17.5655%) [-1.21308x]
mem_icount::memmove::forward small_spread_off_2:setup_forward(Cfg { len : 64, spread : Small, off ...
bytes: 64, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                         558|586                  (-4.77816%) [-1.05018x]
  L1 Hits:                              765|818                  (-6.47922%) [-1.06928x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                     780|838                  (-6.92124%) [-1.07436x]
  Estimated Cycles:                    1230|1458                 (-15.6379%) [-1.18537x]
mem_icount::memmove::forward small_spread_off_3:setup_forward(Cfg { len : 512, spread : Small, off...
bytes: 512, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                        2350|2266                 (+3.70697%) [+1.03707x]
  L1 Hits:                             3005|3058                 (-1.73316%) [-1.01764x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                    3020|3078                 (-1.88434%) [-1.01921x]
  Estimated Cycles:                    3470|3698                 (-6.16549%) [-1.06571x]
mem_icount::memmove::forward small_spread_off_4:setup_forward(Cfg { len : 4096, spread : Small, of...
bytes: 4096, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                       16686|15706                (+6.23965%) [+1.06240x]
  L1 Hits:                            20925|20978                (-0.25265%) [-1.00253x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                   20940|20998                (-0.27622%) [-1.00277x]
  Estimated Cycles:                   21390|21618                (-1.05468%) [-1.01066x]
mem_icount::memmove::forward small_spread_off_5:setup_forward(Cfg { len : MEG1, spread : Small, of...
bytes: 1048576, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                     4194512|3932412              (+6.66512%) [+1.06665x]
  L1 Hits:                          5227285|5227338              (-0.00101%) [-1.00001x]
  L2 Hits:                            15886|15886                (No change)
  RAM Hits:                              16|21                   (-23.8095%) [-1.31250x]
  Total read+write:                 5243187|5243245              (-0.00111%) [-1.00001x]
  Estimated Cycles:                 5307275|5307503              (-0.00430%) [-1.00004x]
mem_icount::memmove::forward medium_spread_off_0:setup_forward(Cfg { len : 16, spread : Medium, off...
bytes: 16, spread: 9, offset: 65, forward
- end of stdout/stderr
  Instructions:                         366|406                  (-9.85222%) [-1.10929x]
  L1 Hits:                              525|578                  (-9.16955%) [-1.10095x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                     540|598                  (-9.69900%) [-1.10741x]
  Estimated Cycles:                     990|1218                 (-18.7192%) [-1.23030x]
mem_icount::memmove::forward medium_spread_off_1:setup_forward(Cfg { len : 32, spread : Medium, off...
bytes: 32, spread: 17, offset: 65, forward
- end of stdout/stderr
  Instructions:                         430|466                  (-7.72532%) [-1.08372x]
  L1 Hits:                              605|658                  (-8.05471%) [-1.08760x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                     620|678                  (-8.55457%) [-1.09355x]
  Estimated Cycles:                    1070|1298                 (-17.5655%) [-1.21308x]
mem_icount::memmove::forward medium_spread_off_2:setup_forward(Cfg { len : 64, spread : Medium, off...
bytes: 64, spread: 33, offset: 65, forward
- end of stdout/stderr
  Instructions:                         558|586                  (-4.77816%) [-1.05018x]
  L1 Hits:                              765|818                  (-6.47922%) [-1.06928x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                     780|838                  (-6.92124%) [-1.07436x]
  Estimated Cycles:                    1230|1458                 (-15.6379%) [-1.18537x]
mem_icount::memmove::forward medium_spread_off_3:setup_forward(Cfg { len : 512, spread : Medium, of...
bytes: 512, spread: 257, offset: 65, forward
- end of stdout/stderr
  Instructions:                        2350|2266                 (+3.70697%) [+1.03707x]
  L1 Hits:                             3005|3058                 (-1.73316%) [-1.01764x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                    3020|3078                 (-1.88434%) [-1.01921x]
  Estimated Cycles:                    3470|3698                 (-6.16549%) [-1.06571x]
mem_icount::memmove::forward medium_spread_off_4:setup_forward(Cfg { len : 4096, spread : Medium, o...
bytes: 4096, spread: 2049, offset: 65, forward
- end of stdout/stderr
  Instructions:                       16686|15706                (+6.23965%) [+1.06240x]
  L1 Hits:                            20925|20978                (-0.25265%) [-1.00253x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|18                   (-27.7778%) [-1.38462x]
  Total read+write:                   20940|20998                (-0.27622%) [-1.00277x]
  Estimated Cycles:                   21390|21618                (-1.05468%) [-1.01066x]
mem_icount::memmove::forward medium_spread_off_5:setup_forward(Cfg { len : MEG1, spread : Medium, o...
bytes: 1048576, spread: 524289, offset: 65, forward
- end of stdout/stderr
  Instructions:                     4194512|3932412              (+6.66512%) [+1.06665x]
  L1 Hits:                          5210647|5210700              (-0.00102%) [-1.00001x]
  L2 Hits:                            32524|32524                (No change)
  RAM Hits:                              16|21                   (-23.8095%) [-1.31250x]
  Total read+write:                 5243187|5243245              (-0.00111%) [-1.00001x]
  Estimated Cycles:                 5373827|5374055              (-0.00424%) [-1.00004x]
mem_icount::memmove::forward large_spread_off_0:setup_forward(Cfg { len : 16, spread : Large, off ...
bytes: 16, spread: 15, offset: 65, forward
- end of stdout/stderr
  Instructions:                         362|399                  (-9.27318%) [-1.10221x]
  L1 Hits:                              518|568                  (-8.80282%) [-1.09653x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|15                   (-20.0000%) [-1.25000x]
  Total read+write:                     532|585                  (-9.05983%) [-1.09962x]
  Estimated Cycles:                     948|1103                 (-14.0526%) [-1.16350x]
mem_icount::memmove::forward large_spread_off_1:setup_forward(Cfg { len : 32, spread : Large, off ...
bytes: 32, spread: 31, offset: 65, forward
- end of stdout/stderr
  Instructions:                         426|459                  (-7.18954%) [-1.07746x]
  L1 Hits:                              598|648                  (-7.71605%) [-1.08361x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|15                   (-20.0000%) [-1.25000x]
  Total read+write:                     612|665                  (-7.96992%) [-1.08660x]
  Estimated Cycles:                    1028|1183                 (-13.1023%) [-1.15078x]
mem_icount::memmove::forward large_spread_off_2:setup_forward(Cfg { len : 64, spread : Large, off ...
bytes: 64, spread: 63, offset: 65, forward
- end of stdout/stderr
  Instructions:                         554|579                  (-4.31779%) [-1.04513x]
  L1 Hits:                              758|808                  (-6.18812%) [-1.06596x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|15                   (-20.0000%) [-1.25000x]
  Total read+write:                     772|825                  (-6.42424%) [-1.06865x]
  Estimated Cycles:                    1188|1343                 (-11.5413%) [-1.13047x]
mem_icount::memmove::forward large_spread_off_3:setup_forward(Cfg { len : 512, spread : Large, off...
bytes: 512, spread: 511, offset: 65, forward
- end of stdout/stderr
  Instructions:                        2346|2259                 (+3.85126%) [+1.03851x]
  L1 Hits:                             2998|3048                 (-1.64042%) [-1.01668x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|15                   (-20.0000%) [-1.25000x]
  Total read+write:                    3012|3065                 (-1.72920%) [-1.01760x]
  Estimated Cycles:                    3428|3583                 (-4.32598%) [-1.04522x]
mem_icount::memmove::forward large_spread_off_4:setup_forward(Cfg { len : 4096, spread : Large, of...
bytes: 4096, spread: 4095, offset: 65, forward
- end of stdout/stderr
  Instructions:                       16682|15699                (+6.26155%) [+1.06262x]
  L1 Hits:                            20918|20968                (-0.23846%) [-1.00239x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|15                   (-20.0000%) [-1.25000x]
  Total read+write:                   20932|20985                (-0.25256%) [-1.00253x]
  Estimated Cycles:                   21348|21503                (-0.72083%) [-1.00726x]
mem_icount::memmove::forward large_spread_off_5:setup_forward(Cfg { len : MEG1, spread : Large, of...
bytes: 1048576, spread: 1048575, offset: 65, forward
- end of stdout/stderr
  Instructions:                     4194508|3932405              (+6.66521%) [+1.06665x]
  L1 Hits:                          5210642|5210692              (-0.00096%) [-1.00001x]
  L2 Hits:                            32522|32522                (No change)
  RAM Hits:                              15|18                   (-16.6667%) [-1.20000x]
  Total read+write:                 5243179|5243232              (-0.00101%) [-1.00001x]
  Estimated Cycles:                 5373777|5373932              (-0.00288%) [-1.00003x]
mem_icount::memmove::backward aligned_0:setup_backward(Cfg { len : 4096, spread : Aligned,...
bytes: 4096, spread: 512, offset: 0, backward
- end of stdout/stderr
  Instructions:                        4388|4388                 (No change)
  L1 Hits:                             6580|6579                 (+0.01520%) [+1.00015x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               9|10                   (-10.0000%) [-1.11111x]
  Total read+write:                    6591|6591                 (No change)
  Estimated Cycles:                    6905|6939                 (-0.48998%) [-1.00492x]
mem_icount::memmove::backward aligned_1:setup_backward(Cfg { len : MEG1, spread : Aligned,...
bytes: 1048576, spread: 512, offset: 0, backward
- end of stdout/stderr
  Instructions:                     1048774|1048774              (No change)
  L1 Hits:                          1556743|1556742              (+0.00006%) [+1.00000x]
  L2 Hits:                            16403|16403                (No change)
  RAM Hits:                              12|13                   (-7.69231%) [-1.08333x]
  Total read+write:                 1573158|1573158              (No change)
  Estimated Cycles:                 1639178|1639212              (-0.00207%) [-1.00002x]
mem_icount::memmove::backward small_spread_0:setup_backward(Cfg { len : 16, spread : Small, off...
bytes: 16, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                         332|370                  (-10.2703%) [-1.11446x]
  L1 Hits:                              482|524                  (-8.01527%) [-1.08714x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                     496|542                  (-8.48708%) [-1.09274x]
  Estimated Cycles:                     912|1094                 (-16.6362%) [-1.19956x]
mem_icount::memmove::backward small_spread_1:setup_backward(Cfg { len : 32, spread : Small, off...
bytes: 32, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                         368|406                  (-9.35961%) [-1.10326x]
  L1 Hits:                              526|568                  (-7.39437%) [-1.07985x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                     540|586                  (-7.84983%) [-1.08519x]
  Estimated Cycles:                     956|1138                 (-15.9930%) [-1.19038x]
mem_icount::memmove::backward small_spread_2:setup_backward(Cfg { len : 64, spread : Small, off...
bytes: 64, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                         440|478                  (-7.94979%) [-1.08636x]
  L1 Hits:                              614|656                  (-6.40244%) [-1.06840x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                     628|674                  (-6.82493%) [-1.07325x]
  Estimated Cycles:                    1044|1226                 (-14.8450%) [-1.17433x]
mem_icount::memmove::backward small_spread_3:setup_backward(Cfg { len : 512, spread : Small, of...
bytes: 512, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                        1448|1486                 (-2.55720%) [-1.02624x]
  L1 Hits:                             1846|1888                 (-2.22458%) [-1.02275x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                    1860|1906                 (-2.41343%) [-1.02473x]
  Estimated Cycles:                    2276|2458                 (-7.40439%) [-1.07996x]
mem_icount::memmove::backward small_spread_4:setup_backward(Cfg { len : 4096, spread : Small, o...
bytes: 4096, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                        9512|9550                 (-0.39791%) [-1.00399x]
  L1 Hits:                            11702|11744                (-0.35763%) [-1.00359x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                   11716|11762                (-0.39109%) [-1.00393x]
  Estimated Cycles:                   12132|12314                (-1.47799%) [-1.01500x]
mem_icount::memmove::backward small_spread_5:setup_backward(Cfg { len : MEG1, spread : Small, o...
bytes: 1048576, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                     2359498|2359536              (-0.00161%) [-1.00002x]
  L1 Hits:                          2867472|2867514              (-0.00146%) [-1.00001x]
  L2 Hits:                            16396|16396                (No change)
  RAM Hits:                              15|19                   (-21.0526%) [-1.26667x]
  Total read+write:                 2883883|2883929              (-0.00160%) [-1.00002x]
  Estimated Cycles:                 2949977|2950159              (-0.00617%) [-1.00006x]
mem_icount::memmove::backward medium_spread_0:setup_backward(Cfg { len : 16, spread : Medium, of...
bytes: 16, spread: 9, offset: 0, backward
- end of stdout/stderr
  Instructions:                         332|370                  (-10.2703%) [-1.11446x]
  L1 Hits:                              482|524                  (-8.01527%) [-1.08714x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                     496|542                  (-8.48708%) [-1.09274x]
  Estimated Cycles:                     912|1094                 (-16.6362%) [-1.19956x]
mem_icount::memmove::backward medium_spread_1:setup_backward(Cfg { len : 32, spread : Medium, of...
bytes: 32, spread: 17, offset: 0, backward
- end of stdout/stderr
  Instructions:                         368|406                  (-9.35961%) [-1.10326x]
  L1 Hits:                              526|568                  (-7.39437%) [-1.07985x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                     540|586                  (-7.84983%) [-1.08519x]
  Estimated Cycles:                     956|1138                 (-15.9930%) [-1.19038x]
mem_icount::memmove::backward medium_spread_2:setup_backward(Cfg { len : 64, spread : Medium, of...
bytes: 64, spread: 33, offset: 0, backward
- end of stdout/stderr
  Instructions:                         440|478                  (-7.94979%) [-1.08636x]
  L1 Hits:                              614|656                  (-6.40244%) [-1.06840x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                     628|674                  (-6.82493%) [-1.07325x]
  Estimated Cycles:                    1044|1226                 (-14.8450%) [-1.17433x]
mem_icount::memmove::backward medium_spread_3:setup_backward(Cfg { len : 512, spread : Medium, o...
bytes: 512, spread: 257, offset: 0, backward
- end of stdout/stderr
  Instructions:                        1448|1486                 (-2.55720%) [-1.02624x]
  L1 Hits:                             1846|1888                 (-2.22458%) [-1.02275x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                    1860|1906                 (-2.41343%) [-1.02473x]
  Estimated Cycles:                    2276|2458                 (-7.40439%) [-1.07996x]
mem_icount::memmove::backward medium_spread_4:setup_backward(Cfg { len : 4096, spread : Medium, ...
bytes: 4096, spread: 2049, offset: 0, backward
- end of stdout/stderr
  Instructions:                        9512|9550                 (-0.39791%) [-1.00399x]
  L1 Hits:                            11702|11744                (-0.35763%) [-1.00359x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                   11716|11762                (-0.39109%) [-1.00393x]
  Estimated Cycles:                   12132|12314                (-1.47799%) [-1.01500x]
mem_icount::memmove::backward medium_spread_5:setup_backward(Cfg { len : MEG1, spread : Medium, ...
bytes: 1048576, spread: 524289, offset: 0, backward
- end of stdout/stderr
  Instructions:                     2359498|2359536              (-0.00161%) [-1.00002x]
  L1 Hits:                          2851088|2851130              (-0.00147%) [-1.00001x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                              15|19                   (-21.0526%) [-1.26667x]
  Total read+write:                 2883883|2883929              (-0.00160%) [-1.00002x]
  Estimated Cycles:                 3015513|3015695              (-0.00604%) [-1.00006x]
mem_icount::memmove::backward large_spread_0:setup_backward(Cfg { len : 16, spread : Large, off...
bytes: 16, spread: 15, offset: 0, backward
- end of stdout/stderr
  Instructions:                         332|369                  (-10.0271%) [-1.11145x]
  L1 Hits:                              482|526                  (-8.36502%) [-1.09129x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                     496|544                  (-8.82353%) [-1.09677x]
  Estimated Cycles:                     912|1096                 (-16.7883%) [-1.20175x]
mem_icount::memmove::backward large_spread_1:setup_backward(Cfg { len : 32, spread : Large, off...
bytes: 32, spread: 31, offset: 0, backward
- end of stdout/stderr
  Instructions:                         368|405                  (-9.13580%) [-1.10054x]
  L1 Hits:                              526|570                  (-7.71930%) [-1.08365x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                     540|588                  (-8.16327%) [-1.08889x]
  Estimated Cycles:                     956|1140                 (-16.1404%) [-1.19247x]
mem_icount::memmove::backward large_spread_2:setup_backward(Cfg { len : 64, spread : Large, off...
bytes: 64, spread: 63, offset: 0, backward
- end of stdout/stderr
  Instructions:                         440|477                  (-7.75681%) [-1.08409x]
  L1 Hits:                              614|658                  (-6.68693%) [-1.07166x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                     628|676                  (-7.10059%) [-1.07643x]
  Estimated Cycles:                    1044|1228                 (-14.9837%) [-1.17625x]
mem_icount::memmove::backward large_spread_3:setup_backward(Cfg { len : 512, spread : Large, of...
bytes: 512, spread: 511, offset: 0, backward
- end of stdout/stderr
  Instructions:                        1448|1485                 (-2.49158%) [-1.02555x]
  L1 Hits:                             1846|1890                 (-2.32804%) [-1.02384x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                    1860|1908                 (-2.51572%) [-1.02581x]
  Estimated Cycles:                    2276|2460                 (-7.47967%) [-1.08084x]
mem_icount::memmove::backward large_spread_4:setup_backward(Cfg { len : 4096, spread : Large, o...
bytes: 4096, spread: 4095, offset: 0, backward
- end of stdout/stderr
  Instructions:                        9512|9549                 (-0.38748%) [-1.00389x]
  L1 Hits:                            11702|11746                (-0.37460%) [-1.00376x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              12|16                   (-25.0000%) [-1.33333x]
  Total read+write:                   11716|11764                (-0.40802%) [-1.00410x]
  Estimated Cycles:                   12132|12316                (-1.49399%) [-1.01517x]
mem_icount::memmove::backward large_spread_5:setup_backward(Cfg { len : MEG1, spread : Large, o...
bytes: 1048576, spread: 1048575, offset: 0, backward
- end of stdout/stderr
  Instructions:                     2359498|2359535              (-0.00157%) [-1.00002x]
  L1 Hits:                          2851088|2851132              (-0.00154%) [-1.00002x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                              15|19                   (-21.0526%) [-1.26667x]
  Total read+write:                 2883883|2883931              (-0.00166%) [-1.00002x]
  Estimated Cycles:                 3015513|3015697              (-0.00610%) [-1.00006x]
mem_icount::memmove::backward aligned_off_0:setup_backward(Cfg { len : 4096, spread : Aligned,...
bytes: 4096, spread: 512, offset: 65, backward
- end of stdout/stderr
  Instructions:                        4402|4402                 (No change)
  L1 Hits:                             6599|6599                 (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              10|10                   (No change)
  Total read+write:                    6611|6611                 (No change)
  Estimated Cycles:                    6959|6959                 (No change)
mem_icount::memmove::backward aligned_off_1:setup_backward(Cfg { len : MEG1, spread : Aligned,...
bytes: 1048576, spread: 512, offset: 65, backward
- end of stdout/stderr
  Instructions:                     1048788|1048788              (No change)
  L1 Hits:                          1556761|1556761              (No change)
  L2 Hits:                            16404|16404                (No change)
  RAM Hits:                              13|13                   (No change)
  Total read+write:                 1573178|1573178              (No change)
  Estimated Cycles:                 1639236|1639236              (No change)
mem_icount::memmove::backward small_spread_off_0:setup_backward(Cfg { len : 16, spread : Small, off...
bytes: 16, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                         341|379                  (-10.0264%) [-1.11144x]
  L1 Hits:                              496|539                  (-7.97774%) [-1.08669x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                     511|557                  (-8.25853%) [-1.09002x]
  Estimated Cycles:                     961|1109                 (-13.3454%) [-1.15401x]
mem_icount::memmove::backward small_spread_off_1:setup_backward(Cfg { len : 32, spread : Small, off...
bytes: 32, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                         377|415                  (-9.15663%) [-1.10080x]
  L1 Hits:                              540|583                  (-7.37564%) [-1.07963x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                     555|601                  (-7.65391%) [-1.08288x]
  Estimated Cycles:                    1005|1153                 (-12.8361%) [-1.14726x]
mem_icount::memmove::backward small_spread_off_2:setup_backward(Cfg { len : 64, spread : Small, off...
bytes: 64, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                         449|487                  (-7.80287%) [-1.08463x]
  L1 Hits:                              628|671                  (-6.40835%) [-1.06847x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                     643|689                  (-6.67634%) [-1.07154x]
  Estimated Cycles:                    1093|1241                 (-11.9259%) [-1.13541x]
mem_icount::memmove::backward small_spread_off_3:setup_backward(Cfg { len : 512, spread : Small, of...
bytes: 512, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                        1457|1495                 (-2.54181%) [-1.02608x]
  L1 Hits:                             1860|1903                 (-2.25959%) [-1.02312x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                    1875|1921                 (-2.39459%) [-1.02453x]
  Estimated Cycles:                    2325|2473                 (-5.98463%) [-1.06366x]
mem_icount::memmove::backward small_spread_off_4:setup_backward(Cfg { len : 4096, spread : Small, o...
bytes: 4096, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                        9521|9559                 (-0.39753%) [-1.00399x]
  L1 Hits:                            11716|11759                (-0.36568%) [-1.00367x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                   11731|11777                (-0.39059%) [-1.00392x]
  Estimated Cycles:                   12181|12329                (-1.20042%) [-1.01215x]
mem_icount::memmove::backward small_spread_off_5:setup_backward(Cfg { len : MEG1, spread : Small, o...
bytes: 1048576, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                     2359507|2359545              (-0.00161%) [-1.00002x]
  L1 Hits:                          2867486|2867529              (-0.00150%) [-1.00001x]
  L2 Hits:                            16396|16396                (No change)
  RAM Hits:                              16|19                   (-15.7895%) [-1.18750x]
  Total read+write:                 2883898|2883944              (-0.00160%) [-1.00002x]
  Estimated Cycles:                 2950026|2950174              (-0.00502%) [-1.00005x]
mem_icount::memmove::backward medium_spread_off_0:setup_backward(Cfg { len : 16, spread : Medium, of...
bytes: 16, spread: 9, offset: 65, backward
- end of stdout/stderr
  Instructions:                         341|379                  (-10.0264%) [-1.11144x]
  L1 Hits:                              496|539                  (-7.97774%) [-1.08669x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                     511|557                  (-8.25853%) [-1.09002x]
  Estimated Cycles:                     961|1109                 (-13.3454%) [-1.15401x]
mem_icount::memmove::backward medium_spread_off_1:setup_backward(Cfg { len : 32, spread : Medium, of...
bytes: 32, spread: 17, offset: 65, backward
- end of stdout/stderr
  Instructions:                         377|415                  (-9.15663%) [-1.10080x]
  L1 Hits:                              540|583                  (-7.37564%) [-1.07963x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                     555|601                  (-7.65391%) [-1.08288x]
  Estimated Cycles:                    1005|1153                 (-12.8361%) [-1.14726x]
mem_icount::memmove::backward medium_spread_off_2:setup_backward(Cfg { len : 64, spread : Medium, of...
bytes: 64, spread: 33, offset: 65, backward
- end of stdout/stderr
  Instructions:                         449|487                  (-7.80287%) [-1.08463x]
  L1 Hits:                              628|671                  (-6.40835%) [-1.06847x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                     643|689                  (-6.67634%) [-1.07154x]
  Estimated Cycles:                    1093|1241                 (-11.9259%) [-1.13541x]
mem_icount::memmove::backward medium_spread_off_3:setup_backward(Cfg { len : 512, spread : Medium, o...
bytes: 512, spread: 257, offset: 65, backward
- end of stdout/stderr
  Instructions:                        1457|1495                 (-2.54181%) [-1.02608x]
  L1 Hits:                             1860|1903                 (-2.25959%) [-1.02312x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                    1875|1921                 (-2.39459%) [-1.02453x]
  Estimated Cycles:                    2325|2473                 (-5.98463%) [-1.06366x]
mem_icount::memmove::backward medium_spread_off_4:setup_backward(Cfg { len : 4096, spread : Medium, ...
bytes: 4096, spread: 2049, offset: 65, backward
- end of stdout/stderr
  Instructions:                        9521|9559                 (-0.39753%) [-1.00399x]
  L1 Hits:                            11716|11759                (-0.36568%) [-1.00367x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                   11731|11777                (-0.39059%) [-1.00392x]
  Estimated Cycles:                   12181|12329                (-1.20042%) [-1.01215x]
mem_icount::memmove::backward medium_spread_off_5:setup_backward(Cfg { len : MEG1, spread : Medium, ...
bytes: 1048576, spread: 524289, offset: 65, backward
- end of stdout/stderr
  Instructions:                     2359507|2359545              (-0.00161%) [-1.00002x]
  L1 Hits:                          2851101|2851144              (-0.00151%) [-1.00002x]
  L2 Hits:                            32781|32781                (No change)
  RAM Hits:                              16|19                   (-15.7895%) [-1.18750x]
  Total read+write:                 2883898|2883944              (-0.00160%) [-1.00002x]
  Estimated Cycles:                 3015566|3015714              (-0.00491%) [-1.00005x]
mem_icount::memmove::backward large_spread_off_0:setup_backward(Cfg { len : 16, spread : Large, off...
bytes: 16, spread: 15, offset: 65, backward
- end of stdout/stderr
  Instructions:                         341|378                  (-9.78836%) [-1.10850x]
  L1 Hits:                              496|541                  (-8.31793%) [-1.09073x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                     511|559                  (-8.58676%) [-1.09393x]
  Estimated Cycles:                     961|1111                 (-13.5014%) [-1.15609x]
mem_icount::memmove::backward large_spread_off_1:setup_backward(Cfg { len : 32, spread : Large, off...
bytes: 32, spread: 31, offset: 65, backward
- end of stdout/stderr
  Instructions:                         377|414                  (-8.93720%) [-1.09814x]
  L1 Hits:                              540|585                  (-7.69231%) [-1.08333x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                     555|603                  (-7.96020%) [-1.08649x]
  Estimated Cycles:                    1005|1155                 (-12.9870%) [-1.14925x]
mem_icount::memmove::backward large_spread_off_2:setup_backward(Cfg { len : 64, spread : Large, off...
bytes: 64, spread: 63, offset: 65, backward
- end of stdout/stderr
  Instructions:                         449|486                  (-7.61317%) [-1.08241x]
  L1 Hits:                              628|673                  (-6.68648%) [-1.07166x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                     643|691                  (-6.94645%) [-1.07465x]
  Estimated Cycles:                    1093|1243                 (-12.0676%) [-1.13724x]
mem_icount::memmove::backward large_spread_off_3:setup_backward(Cfg { len : 512, spread : Large, of...
bytes: 512, spread: 511, offset: 65, backward
- end of stdout/stderr
  Instructions:                        1457|1494                 (-2.47657%) [-1.02539x]
  L1 Hits:                             1860|1905                 (-2.36220%) [-1.02419x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                    1875|1923                 (-2.49610%) [-1.02560x]
  Estimated Cycles:                    2325|2475                 (-6.06061%) [-1.06452x]
mem_icount::memmove::backward large_spread_off_4:setup_backward(Cfg { len : 4096, spread : Large, o...
bytes: 4096, spread: 4095, offset: 65, backward
- end of stdout/stderr
  Instructions:                        9521|9558                 (-0.38711%) [-1.00389x]
  L1 Hits:                            11716|11761                (-0.38262%) [-1.00384x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              13|16                   (-18.7500%) [-1.23077x]
  Total read+write:                   11731|11779                (-0.40750%) [-1.00409x]
  Estimated Cycles:                   12181|12331                (-1.21645%) [-1.01231x]
mem_icount::memmove::backward large_spread_off_5:setup_backward(Cfg { len : MEG1, spread : Large, o...
bytes: 1048576, spread: 1048575, offset: 65, backward
- end of stdout/stderr
  Instructions:                     2359507|2359544              (-0.00157%) [-1.00002x]
  L1 Hits:                          2851102|2851147              (-0.00158%) [-1.00002x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                              16|19                   (-15.7895%) [-1.18750x]
  Total read+write:                 2883898|2883946              (-0.00166%) [-1.00002x]
  Estimated Cycles:                 3015562|3015712              (-0.00497%) [-1.00005x]

tgross35 · 2025-03-24T02:18:11Z

Maybe one day somebody will port https://github.com/ARM-software/optimized-routines/blob/850309be878e7d15d064ea7d5589bb0266499288/string/arm/memcpy.S to a #[naked] function :)

RalfJung · 2025-03-24T06:59:43Z

Hm, the assembly should be quite close to what it was before my changes now, shouldn't it? Except for the last loop iteration being unrolled. But for larger sizes, only the loop should matter.

I played around with the loop conditions a bit to make the loop more like how it as before. Does that make a difference?

RalfJung force-pushed the copy_misaligned_words-size branch from 02c5049 to 86a3fe3 Compare March 23, 2025 10:00

RalfJung changed the title ~~copy_misaligned_words: reduce codesize~~ copy_misaligned_words: use inline asm on ARM, simplify fallback implementation Mar 23, 2025

RalfJung force-pushed the copy_misaligned_words-size branch from 86a3fe3 to ba18877 Compare March 23, 2025 10:02

copy_misaligned_words: use inline asm on ARM, simplify fallback imple…

1fd36a5

…mentation

RalfJung force-pushed the copy_misaligned_words-size branch from ba18877 to 1fd36a5 Compare March 23, 2025 10:04

play around with the loop conditions a bit

b91ec3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

copy_misaligned_words: use inline asm on ARM, simplify fallback implementation #808

copy_misaligned_words: use inline asm on ARM, simplify fallback implementation #808

RalfJung commented Mar 22, 2025

tgross35 commented Mar 22, 2025

tgross35 commented Mar 22, 2025 •

edited

Loading

RalfJung commented Mar 22, 2025

RalfJung commented Mar 22, 2025

tgross35 commented Mar 22, 2025

beetrees commented Mar 22, 2025 •

edited

Loading

RalfJung commented Mar 23, 2025

tgross35 commented Mar 24, 2025

tgross35 commented Mar 24, 2025

RalfJung commented Mar 24, 2025

copy_misaligned_words: use inline asm on ARM, simplify fallback implementation #808

Are you sure you want to change the base?

copy_misaligned_words: use inline asm on ARM, simplify fallback implementation #808

Conversation

RalfJung commented Mar 22, 2025

tgross35 commented Mar 22, 2025

tgross35 commented Mar 22, 2025 • edited Loading

RalfJung commented Mar 22, 2025

RalfJung commented Mar 22, 2025

tgross35 commented Mar 22, 2025

beetrees commented Mar 22, 2025 • edited Loading

RalfJung commented Mar 23, 2025

tgross35 commented Mar 24, 2025

tgross35 commented Mar 24, 2025

RalfJung commented Mar 24, 2025

tgross35 commented Mar 22, 2025 •

edited

Loading

beetrees commented Mar 22, 2025 •

edited

Loading