Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy_misaligned_words: avoid out-of-bounds accesses #799

Merged
merged 5 commits into from
Mar 22, 2025

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Mar 18, 2025

Fixes #559 for memmove/memcpy: load the underaligned prefix and suffix in copy_*_misaligned_words in up to 3 separate aligned loads (a 1-byte load, a 2-byte load, and for 64bit targets a 4-byte load), while only doing those loads that are actually inbounds. The hope is that the performance loss compared to a single aligned ptr-sized load is negligible.

I confirmed that this now passes Miri (the second of these already worked before this PR):

# target without mem-unaligned
MIRIFLAGS=-Zmiri-tree-borrows cargo miri test --features no-asm --target armv7-unknown-linux-gnueabihf -- align
# target with mem-unaligned
MIRIFLAGS=-Zmiri-tree-borrows cargo miri test --features no-asm --target x86_64-unknown-linux-gnu -- align

I added a new test since the existing test had some slack space around the memory being copied, making all accesses accidentally inbounds (but Miri was still helpful to confirm everything is aligned). This test found a bug in my code, fixed in the second commit. :D

This also add those above commands to CI so hopefully this crate still stay green for Miri. :)

@tgross35
Copy link
Contributor

Is there some good place in the CI config to add this Miri check? Note that I am only running some of the tests (those with align in their name) as otherwise this will take ~forever; some tests have large iteration counts. We need Tree Borrows since the test suite has the as_ptr+as_mut_ptr pattern that is not compatible with Stacked Borrows.

I've been meaning to ask about this, it sounds like a great idea to me. You can just add a new main.yml CI job, probably with these bits

- uses: actions/checkout@v4
with:
submodules: true
- name: Install Rust (rustup)
run: rustup update ${{ matrix.rust }} --no-self-update && rustup default ${{ matrix.rust }}
shell: bash
- run: rustup target add ${{ matrix.target }}
- run: rustup component add llvm-tools-preview
- uses: Swatinem/rust-cache@v2
then put the rest in a script.

The float tests can probably be skipped since that module has no unsafe (we might even be able to forbid it) and it's probably quite slow to run.

@RalfJung
Copy link
Member Author

Even mem has very slow tests like this one. That's why I only ran the align tests. Though maybe it'd be worth reducing those constants in Miri so more tests can run. I don't want to go over the entire test suite though, that sounds like a lot of work. ;)

@RalfJung RalfJung force-pushed the memmove-inbounds branch 6 times, most recently from e4ef842 to 842fa94 Compare March 18, 2025 21:51
src/mem/impls.rs Outdated
dest_usize = dest_usize.wrapping_add(1);
}

// There's one more element left to go, and we can't use the loop for that as on the `src` side,
// it is partially out-of-bounds.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code previously seemed unaware that there can also be OOB accesses at the end of the range -- but of course that's fundamentally the same problem as at the beginning.

@RalfJung RalfJung force-pushed the memmove-inbounds branch 2 times, most recently from 605b6ca to d8cf8a2 Compare March 18, 2025 22:17
@RalfJung
Copy link
Member Author

Miri is looking good on CI :)

@RalfJung RalfJung force-pushed the memmove-inbounds branch 2 times, most recently from cd0efd6 to 01c90a6 Compare March 18, 2025 22:44
Copy link
Contributor

@tgross35 tgross35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some surface-level notes, I'll take a closer look at perf soonish

@RalfJung RalfJung force-pushed the memmove-inbounds branch 5 times, most recently from b3464f6 to 9330e7c Compare March 19, 2025 08:41
@tgross35
Copy link
Contributor

tgross35 commented Mar 19, 2025

Unfortunately it looks like this comes close to doubling the total line and label counts of this routine https://godbolt.org/z/7WYa6e83n. I agree that the UB is worth fixing even at a performance hit, but I have to imagine this could be improved with massaging.

@nbdd0121 I know it has been a long time since you worked on #405 but do you have any ideas on how to improve the codegen here without OOB access?

(I haven't actually tested so it is possible visual asm heuristics don't accurately reflect runtime, but the end blocks are definitely larger)

@RalfJung
Copy link
Member Author

Yeah there's prefix and postfix handling now which of course adds some extra code and labels. The original code neglected to treat the last loop iteration differently which makes this not a fully fair comparison (at the very least, we should compare with a version that uses an atomic/volatile load for the last round, as that one can also be OOB).

The code size could be reduced by using copy_forward_bytes instead of load_aligned_partial/load_aligned_end_partial. But I would expect that to be worse for performance...

This compares the "original but with the final loop iteration unrolled" with the copy_forward_bytes variant: https://godbolt.org/z/768YPGqGG. Still an increase, but "only" by 60%.

@nbdd0121
Copy link
Contributor

Technically last iteration of loop doesn't special handling if use use unordered atomic load for each loop iteration. The codegen shouldn't be massively different.

Using byte copy doesn't necessarily mean worse performance as it at most (on each end) performs 3 additional byte copies. But it also removes data-dependent branches which is hard to predict. This can also merged with the out-most byte-copy computation. I guess benchmarking would be necessary.

@RalfJung
Copy link
Member Author

In my view, since this is almost certainly still faster than the code before #405, and that PR achieved its performance by having UB, this is still a win.

But I'd also be curious what the numbers actually look like. If someone has access to an ARM-32 system and could benchmark this, that would be great. :)

@tgross35
Copy link
Contributor

Just for an update, I'm trying pretty hard to get some form of benchmarks just so we have a reference, not that I think perf is really worth blocking on as long as it's not somehow awful. There are now instruction count benchmarks, you need iai-callgrind-runner (via Cargo) and Valgrind then can run cargo bench -p testcrate --bench mem_icount --features icount -- --nocapture. I'm reasonably close to either getting the icount benchmarks to run in qemu for armv7 or giving up at that.

Mind rebasing at some point to pick that up? (Sorry, I moved some things around hence the conflict)

@RalfJung
Copy link
Member Author

Sure, the rebase went through without any manual work.

@tgross35
Copy link
Contributor

Gave up on qemu, but I had a 32-bit raspberry pi laying around :)

memcpy has about a 25% slowdown for the alignment mismatch tests. memmove has about the same slowdown but it slows down more significantly for larger copies. Worse case results:

mem_icount::memcpy::bench misaligned_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 66 })
bytes: 1048576 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                     2359657|1835335              (+28.5682%) [+1.28568x]
  L1 Hits:                          2851307|2326978              (+22.5326%) [+1.22533x]
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                              13|10                   (+30.0000%) [+1.30000x]
  Total read+write:                 2884102|2359770              (+22.2196%) [+1.22220x]
  Estimated Cycles:                 3015672|2491238              (+21.0511%) [+1.21051x]
mem_icount::memmove::forward large_spread_5:setup_forward(Cfg { len : MEG1, spread : Large, of...
bytes: 1048576, spread: 1048575, offset: 0, forward
- end of stdout/stderr
  Instructions:                     3932410|1835226              (+114.274%) [+2.14274x]
  L1 Hits:                          5210698|2327074              (+123.916%) [+2.23916x]
  L2 Hits:                            32523|32524                (-0.00307%) [-1.00003x]
  RAM Hits:                              19|13                   (+46.1538%) [+1.46154x]
  Total read+write:                 5243240|2359611              (+122.208%) [+2.22208x]
  Estimated Cycles:                 5373978|2490149              (+115.809%) [+2.15809x]

I don't expect that to make or break anyone's day, so let's get this UB fixed 🎉.

Full benchmark log
     Running benches/mem_icount.rs (target/release/deps/mem_icount-f7ca6dbcc87f37e2)
mem_icount::memcpy::bench aligned_0:setup(Cfg { len : 16, s_off : 0, d_off : 0 })
bytes: 16 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         519|517                  (+0.38685%) [+1.00387x]
  L1 Hits:                              779|775                  (+0.51613%) [+1.00516x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     784|780                  (+0.51282%) [+1.00513x]
  Estimated Cycles:                     954|950                  (+0.42105%) [+1.00421x]
mem_icount::memcpy::bench aligned_1:setup(Cfg { len : 32, s_off : 0, d_off : 0 })
bytes: 32 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         535|533                  (+0.37523%) [+1.00375x]
  L1 Hits:                              803|799                  (+0.50063%) [+1.00501x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     808|804                  (+0.49751%) [+1.00498x]
  Estimated Cycles:                     978|974                  (+0.41068%) [+1.00411x]
mem_icount::memcpy::bench aligned_2:setup(Cfg { len : 64, s_off : 0, d_off : 0 })
bytes: 64 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         567|565                  (+0.35398%) [+1.00354x]
  L1 Hits:                              851|847                  (+0.47226%) [+1.00472x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     856|852                  (+0.46948%) [+1.00469x]
  Estimated Cycles:                    1026|1022                 (+0.39139%) [+1.00391x]
mem_icount::memcpy::bench aligned_3:setup(Cfg { len : 512, s_off : 0, d_off : 0 })
bytes: 512 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                        1015|1013                 (+0.19743%) [+1.00197x]
  L1 Hits:                             1523|1519                 (+0.26333%) [+1.00263x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1528|1524                 (+0.26247%) [+1.00262x]
  Estimated Cycles:                    1698|1694                 (+0.23613%) [+1.00236x]
mem_icount::memcpy::bench aligned_4:setup(Cfg { len : 4096, s_off : 0, d_off : 0 })
bytes: 4096 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                        4599|4597                 (+0.04351%) [+1.00044x]
  L1 Hits:                             6899|6895                 (+0.05801%) [+1.00058x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    6904|6900                 (+0.05797%) [+1.00058x]
  Estimated Cycles:                    7074|7070                 (+0.05658%) [+1.00057x]
mem_icount::memcpy::bench aligned_5:setup(Cfg { len : MEG1, s_off : 0, d_off : 0 })
bytes: 1048576 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                     1048886|1048884              (+0.00019%) [+1.00000x]
  L1 Hits:                          1540527|1540523              (+0.00026%) [+1.00000x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 1573316|1573312              (+0.00025%) [+1.00000x]
  Estimated Cycles:                 1704742|1704738              (+0.00023%) [+1.00000x]
mem_icount::memcpy::bench offset_0:setup(Cfg { len : 16, s_off : 65, d_off : 65 })
bytes: 16 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         533|531                  (+0.37665%) [+1.00377x]
  L1 Hits:                              799|795                  (+0.50314%) [+1.00503x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     804|800                  (+0.50000%) [+1.00500x]
  Estimated Cycles:                     974|970                  (+0.41237%) [+1.00412x]
mem_icount::memcpy::bench offset_1:setup(Cfg { len : 32, s_off : 65, d_off : 65 })
bytes: 32 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         549|547                  (+0.36563%) [+1.00366x]
  L1 Hits:                              823|819                  (+0.48840%) [+1.00488x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     828|824                  (+0.48544%) [+1.00485x]
  Estimated Cycles:                     998|994                  (+0.40241%) [+1.00402x]
mem_icount::memcpy::bench offset_2:setup(Cfg { len : 64, s_off : 65, d_off : 65 })
bytes: 64 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         581|579                  (+0.34542%) [+1.00345x]
  L1 Hits:                              871|867                  (+0.46136%) [+1.00461x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     876|872                  (+0.45872%) [+1.00459x]
  Estimated Cycles:                    1046|1042                 (+0.38388%) [+1.00384x]
mem_icount::memcpy::bench offset_3:setup(Cfg { len : 512, s_off : 65, d_off : 65 })
bytes: 512 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                        1029|1027                 (+0.19474%) [+1.00195x]
  L1 Hits:                             1543|1539                 (+0.25991%) [+1.00260x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1548|1544                 (+0.25907%) [+1.00259x]
  Estimated Cycles:                    1718|1714                 (+0.23337%) [+1.00233x]
mem_icount::memcpy::bench offset_4:setup(Cfg { len : 4096, s_off : 65, d_off : 65 })
bytes: 4096 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                        4613|4611                 (+0.04337%) [+1.00043x]
  L1 Hits:                             6919|6915                 (+0.05785%) [+1.00058x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    6924|6920                 (+0.05780%) [+1.00058x]
  Estimated Cycles:                    7094|7090                 (+0.05642%) [+1.00056x]
mem_icount::memcpy::bench offset_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 65 })
bytes: 1048576 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                     1048900|1048898              (+0.00019%) [+1.00000x]
  L1 Hits:                          1540545|1540541              (+0.00026%) [+1.00000x]
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 1573336|1573332              (+0.00025%) [+1.00000x]
  Estimated Cycles:                 1704770|1704766              (+0.00023%) [+1.00000x]
mem_icount::memcpy::bench misaligned_0:setup(Cfg { len : 16, s_off : 65, d_off : 66 })
bytes: 16 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         590|548                  (+7.66423%) [+1.07664x]
  L1 Hits:                              861|812                  (+6.03448%) [+1.06034x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|6                    (+50.0000%) [+1.50000x]
  Total read+write:                     870|818                  (+6.35697%) [+1.06357x]
  Estimated Cycles:                    1176|1022                 (+15.0685%) [+1.15068x]
mem_icount::memcpy::bench misaligned_1:setup(Cfg { len : 32, s_off : 65, d_off : 66 })
bytes: 32 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         626|576                  (+8.68056%) [+1.08681x]
  L1 Hits:                              905|848                  (+6.72170%) [+1.06722x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|6                    (+50.0000%) [+1.50000x]
  Total read+write:                     914|854                  (+7.02576%) [+1.07026x]
  Estimated Cycles:                    1220|1058                 (+15.3119%) [+1.15312x]
mem_icount::memcpy::bench misaligned_2:setup(Cfg { len : 64, s_off : 65, d_off : 66 })
bytes: 64 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         698|632                  (+10.4430%) [+1.10443x]
  L1 Hits:                              993|920                  (+7.93478%) [+1.07935x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|6                    (+50.0000%) [+1.50000x]
  Total read+write:                    1002|926                  (+8.20734%) [+1.08207x]
  Estimated Cycles:                    1308|1130                 (+15.7522%) [+1.15752x]
mem_icount::memcpy::bench misaligned_3:setup(Cfg { len : 512, s_off : 65, d_off : 66 })
bytes: 512 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                        1706|1416                 (+20.4802%) [+1.20480x]
  L1 Hits:                             2225|1928                 (+15.4046%) [+1.15405x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|6                    (+50.0000%) [+1.50000x]
  Total read+write:                    2234|1934                 (+15.5119%) [+1.15512x]
  Estimated Cycles:                    2540|2138                 (+18.8026%) [+1.18803x]
mem_icount::memcpy::bench misaligned_4:setup(Cfg { len : 4096, s_off : 65, d_off : 66 })
bytes: 4096 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                        9770|7688                 (+27.0812%) [+1.27081x]
  L1 Hits:                            12081|9992                 (+20.9067%) [+1.20907x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|6                    (+50.0000%) [+1.50000x]
  Total read+write:                   12090|9998                 (+20.9242%) [+1.20924x]
  Estimated Cycles:                   12396|10202                (+21.5056%) [+1.21506x]
mem_icount::memcpy::bench misaligned_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 66 })
bytes: 1048576 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                     2359657|1835335              (+28.5682%) [+1.28568x]
  L1 Hits:                          2851307|2326978              (+22.5326%) [+1.22533x]
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                              13|10                   (+30.0000%) [+1.30000x]
  Total read+write:                 2884102|2359770              (+22.2196%) [+1.22220x]
  Estimated Cycles:                 3015672|2491238              (+21.0511%) [+1.21051x]
mem_icount::memset::bench aligned_0:setup(Cfg { len : 16, offset : 0 })
bytes: 16, offset: 0
- end of stdout/stderr
  Instructions:                         288|288                  (No change)
  L1 Hits:                              418|418                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     427|427                  (No change)
  Estimated Cycles:                     673|673                  (No change)
mem_icount::memset::bench aligned_1:setup(Cfg { len : 32, offset : 0 })
bytes: 32, offset: 0
- end of stdout/stderr
  Instructions:                         300|300                  (No change)
  L1 Hits:                              434|434                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     443|443                  (No change)
  Estimated Cycles:                     689|689                  (No change)
mem_icount::memset::bench aligned_2:setup(Cfg { len : 64, offset : 0 })
bytes: 64, offset: 0
- end of stdout/stderr
  Instructions:                         324|324                  (No change)
  L1 Hits:                              466|466                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     475|475                  (No change)
  Estimated Cycles:                     721|721                  (No change)
mem_icount::memset::bench aligned_3:setup(Cfg { len : 512, offset : 0 })
bytes: 512, offset: 0
- end of stdout/stderr
  Instructions:                         660|660                  (No change)
  L1 Hits:                              914|914                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     923|923                  (No change)
  Estimated Cycles:                    1169|1169                 (No change)
mem_icount::memset::bench aligned_4:setup(Cfg { len : 4096, offset : 0 })
bytes: 4096, offset: 0
- end of stdout/stderr
  Instructions:                        3348|3348                 (No change)
  L1 Hits:                             4498|4498                 (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                    4507|4507                 (No change)
  Estimated Cycles:                    4753|4753                 (No change)
mem_icount::memset::bench aligned_5:setup(Cfg { len : MEG1, offset : 0 })
bytes: 1048576, offset: 0
- end of stdout/stderr
  Instructions:                      786614|786614               (No change)
  L1 Hits:                          1032429|1032429              (No change)
  L2 Hits:                            16395|16395                (No change)
  RAM Hits:                              10|10                   (No change)
  Total read+write:                 1048834|1048834              (No change)
  Estimated Cycles:                 1114754|1114754              (No change)
mem_icount::memset::bench offset_0:setup(Cfg { len : 16, offset : 65 })
bytes: 16, offset: 65
- end of stdout/stderr
  Instructions:                         300|300                  (No change)
  L1 Hits:                              433|433                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     442|442                  (No change)
  Estimated Cycles:                     688|688                  (No change)
mem_icount::memset::bench offset_1:setup(Cfg { len : 32, offset : 65 })
bytes: 32, offset: 65
- end of stdout/stderr
  Instructions:                         312|312                  (No change)
  L1 Hits:                              449|449                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     458|458                  (No change)
  Estimated Cycles:                     704|704                  (No change)
mem_icount::memset::bench offset_2:setup(Cfg { len : 64, offset : 65 })
bytes: 64, offset: 65
- end of stdout/stderr
  Instructions:                         336|336                  (No change)
  L1 Hits:                              481|481                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     490|490                  (No change)
  Estimated Cycles:                     736|736                  (No change)
mem_icount::memset::bench offset_3:setup(Cfg { len : 512, offset : 65 })
bytes: 512, offset: 65
- end of stdout/stderr
  Instructions:                         672|672                  (No change)
  L1 Hits:                              929|929                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     938|938                  (No change)
  Estimated Cycles:                    1184|1184                 (No change)
mem_icount::memset::bench offset_4:setup(Cfg { len : 4096, offset : 65 })
bytes: 4096, offset: 65
- end of stdout/stderr
  Instructions:                        3360|3360                 (No change)
  L1 Hits:                             4513|4513                 (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                    4522|4522                 (No change)
  Estimated Cycles:                    4768|4768                 (No change)
mem_icount::memset::bench offset_5:setup(Cfg { len : MEG1, offset : 65 })
bytes: 1048576, offset: 65
- end of stdout/stderr
  Instructions:                      786626|786626               (No change)
  L1 Hits:                          1032443|1032443              (No change)
  L2 Hits:                            16396|16396                (No change)
  RAM Hits:                              10|10                   (No change)
  Total read+write:                 1048849|1048849              (No change)
  Estimated Cycles:                 1114773|1114773              (No change)
mem_icount::memcmp::bench aligned_0:setup(Cfg { len : 16, s_off : 0, d_off : 0 })
bytes: 16, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         579|579                  (No change)
  L1 Hits:                              850|850                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|4                    (No change)
  Total read+write:                     854|854                  (No change)
  Estimated Cycles:                     990|990                  (No change)
mem_icount::memcmp::bench aligned_1:setup(Cfg { len : 32, s_off : 0, d_off : 0 })
bytes: 32, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         675|675                  (No change)
  L1 Hits:                              977|977                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     982|982                  (No change)
  Estimated Cycles:                    1152|1152                 (No change)
mem_icount::memcmp::bench aligned_2:setup(Cfg { len : 64, s_off : 0, d_off : 0 })
bytes: 64, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         867|867                  (No change)
  L1 Hits:                             1233|1233                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1238|1238                 (No change)
  Estimated Cycles:                    1408|1408                 (No change)
mem_icount::memcmp::bench aligned_3:setup(Cfg { len : 512, s_off : 0, d_off : 0 })
bytes: 512, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                        3555|3555                 (No change)
  L1 Hits:                             4817|4817                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    4822|4822                 (No change)
  Estimated Cycles:                    4992|4992                 (No change)
mem_icount::memcmp::bench aligned_4:setup(Cfg { len : 4096, s_off : 0, d_off : 0 })
bytes: 4096, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                       25059|25059                (No change)
  L1 Hits:                            33489|33489                (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                   33494|33494                (No change)
  Estimated Cycles:                   33664|33664                (No change)
mem_icount::memcmp::bench aligned_5:setup(Cfg { len : MEG1, s_off : 0, d_off : 0 })
bytes: 1048576, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                     6291746|6291746              (No change)
  L1 Hits:                          8356237|8356237              (No change)
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 8389026|8389026              (No change)
  Estimated Cycles:                 8520452|8520452              (No change)
mem_icount::memcmp::bench offset_0:setup(Cfg { len : 16, s_off : 65, d_off : 65 })
bytes: 16, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         579|579                  (No change)
  L1 Hits:                              849|849                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     854|854                  (No change)
  Estimated Cycles:                    1024|1024                 (No change)
mem_icount::memcmp::bench offset_1:setup(Cfg { len : 32, s_off : 65, d_off : 65 })
bytes: 32, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         675|675                  (No change)
  L1 Hits:                              977|977                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     982|982                  (No change)
  Estimated Cycles:                    1152|1152                 (No change)
mem_icount::memcmp::bench offset_2:setup(Cfg { len : 64, s_off : 65, d_off : 65 })
bytes: 64, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         867|867                  (No change)
  L1 Hits:                             1233|1233                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1238|1238                 (No change)
  Estimated Cycles:                    1408|1408                 (No change)
mem_icount::memcmp::bench offset_3:setup(Cfg { len : 512, s_off : 65, d_off : 65 })
bytes: 512, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                        3555|3555                 (No change)
  L1 Hits:                             4817|4817                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    4822|4822                 (No change)
  Estimated Cycles:                    4992|4992                 (No change)
mem_icount::memcmp::bench offset_4:setup(Cfg { len : 4096, s_off : 65, d_off : 65 })
bytes: 4096, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                       25059|25059                (No change)
  L1 Hits:                            33489|33489                (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                   33494|33494                (No change)
  Estimated Cycles:                   33664|33664                (No change)
mem_icount::memcmp::bench offset_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 65 })
bytes: 1048576, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                     6291746|6291746              (No change)
  L1 Hits:                          8356235|8356235              (No change)
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 8389026|8389026              (No change)
  Estimated Cycles:                 8520460|8520460              (No change)
mem_icount::memcmp::bench misaligned_0:setup(Cfg { len : 16, s_off : 65, d_off : 66 })
bytes: 16, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         579|579                  (No change)
  L1 Hits:                              849|849                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     854|854                  (No change)
  Estimated Cycles:                    1024|1024                 (No change)
mem_icount::memcmp::bench misaligned_1:setup(Cfg { len : 32, s_off : 65, d_off : 66 })
bytes: 32, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         675|675                  (No change)
  L1 Hits:                              977|977                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     982|982                  (No change)
  Estimated Cycles:                    1152|1152                 (No change)
mem_icount::memcmp::bench misaligned_2:setup(Cfg { len : 64, s_off : 65, d_off : 66 })
bytes: 64, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         867|867                  (No change)
  L1 Hits:                             1233|1233                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1238|1238                 (No change)
  Estimated Cycles:                    1408|1408                 (No change)
mem_icount::memcmp::bench misaligned_3:setup(Cfg { len : 512, s_off : 65, d_off : 66 })
bytes: 512, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                        3555|3555                 (No change)
  L1 Hits:                             4817|4817                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    4822|4822                 (No change)
  Estimated Cycles:                    4992|4992                 (No change)
mem_icount::memcmp::bench misaligned_4:setup(Cfg { len : 4096, s_off : 65, d_off : 66 })
bytes: 4096, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                       25059|25059                (No change)
  L1 Hits:                            33489|33489                (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                   33494|33494                (No change)
  Estimated Cycles:                   33664|33664                (No change)
mem_icount::memcmp::bench misaligned_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 66 })
bytes: 1048576, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                     6291746|6291746              (No change)
  L1 Hits:                          8356235|8356235              (No change)
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 8389026|8389026              (No change)
  Estimated Cycles:                 8520460|8520460              (No change)
mem_icount::memmove::forward aligned_0:setup_forward(Cfg { len : 4096, spread : Aligned, ...
bytes: 4096, spread: 512, offset: 0, forward
- end of stdout/stderr
  Instructions:                        4391|4388                 (+0.06837%) [+1.00068x]
  L1 Hits:                             6579|6574                 (+0.07606%) [+1.00076x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                    6590|6585                 (+0.07593%) [+1.00076x]
  Estimated Cycles:                    6904|6899                 (+0.07247%) [+1.00072x]
mem_icount::memmove::forward aligned_1:setup_forward(Cfg { len : MEG1, spread : Aligned, ...
bytes: 1048576, spread: 512, offset: 0, forward
- end of stdout/stderr
  Instructions:                     1048777|1048774              (+0.00029%) [+1.00000x]
  L1 Hits:                          1557243|1557238              (+0.00032%) [+1.00000x]
  L2 Hits:                            15902|15902                (No change)
  RAM Hits:                              12|12                   (No change)
  Total read+write:                 1573157|1573152              (+0.00032%) [+1.00000x]
  Estimated Cycles:                 1637173|1637168              (+0.00031%) [+1.00000x]
mem_icount::memmove::forward small_spread_0:setup_forward(Cfg { len : 16, spread : Small, off ...
bytes: 16, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                         406|340                  (+19.4118%) [+1.19412x]
  L1 Hits:                              578|492                  (+17.4797%) [+1.17480x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     598|504                  (+18.6508%) [+1.18651x]
  Estimated Cycles:                    1218|852                  (+42.9577%) [+1.42958x]
mem_icount::memmove::forward small_spread_1:setup_forward(Cfg { len : 32, spread : Small, off ...
bytes: 32, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                         466|368                  (+26.6304%) [+1.26630x]
  L1 Hits:                              658|528                  (+24.6212%) [+1.24621x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     678|540                  (+25.5556%) [+1.25556x]
  Estimated Cycles:                    1298|888                  (+46.1712%) [+1.46171x]
mem_icount::memmove::forward small_spread_2:setup_forward(Cfg { len : 64, spread : Small, off ...
bytes: 64, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                         586|424                  (+38.2075%) [+1.38208x]
  L1 Hits:                              818|600                  (+36.3333%) [+1.36333x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     838|612                  (+36.9281%) [+1.36928x]
  Estimated Cycles:                    1458|960                  (+51.8750%) [+1.51875x]
mem_icount::memmove::forward small_spread_3:setup_forward(Cfg { len : 512, spread : Small, off...
bytes: 512, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                        2266|1208                 (+87.5828%) [+1.87583x]
  L1 Hits:                             3058|1608                 (+90.1741%) [+1.90174x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                    3078|1620                 (+90.0000%) [+1.90000x]
  Estimated Cycles:                    3698|1968                 (+87.9065%) [+1.87907x]
mem_icount::memmove::forward small_spread_4:setup_forward(Cfg { len : 4096, spread : Small, of...
bytes: 4096, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                       15706|7480                 (+109.973%) [+2.09973x]
  L1 Hits:                            20978|9672                 (+116.894%) [+2.16894x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                   20998|9684                 (+116.832%) [+2.16832x]
  Estimated Cycles:                   21618|10032                (+115.490%) [+2.15490x]
mem_icount::memmove::forward small_spread_5:setup_forward(Cfg { len : MEG1, spread : Small, of...
bytes: 1048576, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                     3932412|1835226              (+114.274%) [+2.14274x]
  L1 Hits:                          5227337|2343710              (+123.037%) [+2.23037x]
  L2 Hits:                            15887|15888                (-0.00629%) [-1.00006x]
  RAM Hits:                              21|13                   (+61.5385%) [+1.61538x]
  Total read+write:                 5243245|2359611              (+122.208%) [+2.22208x]
  Estimated Cycles:                 5307507|2423605              (+118.992%) [+2.18992x]
mem_icount::memmove::forward medium_spread_0:setup_forward(Cfg { len : 16, spread : Medium, off...
bytes: 16, spread: 9, offset: 0, forward
- end of stdout/stderr
  Instructions:                         406|340                  (+19.4118%) [+1.19412x]
  L1 Hits:                              578|492                  (+17.4797%) [+1.17480x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     598|504                  (+18.6508%) [+1.18651x]
  Estimated Cycles:                    1218|852                  (+42.9577%) [+1.42958x]
mem_icount::memmove::forward medium_spread_1:setup_forward(Cfg { len : 32, spread : Medium, off...
bytes: 32, spread: 17, offset: 0, forward
- end of stdout/stderr
  Instructions:                         466|368                  (+26.6304%) [+1.26630x]
  L1 Hits:                              658|528                  (+24.6212%) [+1.24621x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     678|540                  (+25.5556%) [+1.25556x]
  Estimated Cycles:                    1298|888                  (+46.1712%) [+1.46171x]
mem_icount::memmove::forward medium_spread_2:setup_forward(Cfg { len : 64, spread : Medium, off...
bytes: 64, spread: 33, offset: 0, forward
- end of stdout/stderr
  Instructions:                         586|424                  (+38.2075%) [+1.38208x]
  L1 Hits:                              818|600                  (+36.3333%) [+1.36333x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     838|612                  (+36.9281%) [+1.36928x]
  Estimated Cycles:                    1458|960                  (+51.8750%) [+1.51875x]
mem_icount::memmove::forward medium_spread_3:setup_forward(Cfg { len : 512, spread : Medium, of...
bytes: 512, spread: 257, offset: 0, forward
- end of stdout/stderr
  Instructions:                        2266|1208                 (+87.5828%) [+1.87583x]
  L1 Hits:                             3058|1608                 (+90.1741%) [+1.90174x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                    3078|1620                 (+90.0000%) [+1.90000x]
  Estimated Cycles:                    3698|1968                 (+87.9065%) [+1.87907x]
mem_icount::memmove::forward medium_spread_4:setup_forward(Cfg { len : 4096, spread : Medium, o...
bytes: 4096, spread: 2049, offset: 0, forward
- end of stdout/stderr
  Instructions:                       15706|7480                 (+109.973%) [+2.09973x]
  L1 Hits:                            20978|9672                 (+116.894%) [+2.16894x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                   20998|9684                 (+116.832%) [+2.16832x]
  Estimated Cycles:                   21618|10032                (+115.490%) [+2.15490x]
mem_icount::memmove::forward medium_spread_5:setup_forward(Cfg { len : MEG1, spread : Medium, o...
bytes: 1048576, spread: 524289, offset: 0, forward
- end of stdout/stderr
  Instructions:                     3932412|1835226              (+114.274%) [+2.14274x]
  L1 Hits:                          5210701|2327074              (+123.916%) [+2.23916x]
  L2 Hits:                            32523|32524                (-0.00307%) [-1.00003x]
  RAM Hits:                              21|13                   (+61.5385%) [+1.61538x]
  Total read+write:                 5243245|2359611              (+122.208%) [+2.22208x]
  Estimated Cycles:                 5374051|2490149              (+115.812%) [+2.15812x]
mem_icount::memmove::forward large_spread_0:setup_forward(Cfg { len : 16, spread : Large, off ...
bytes: 16, spread: 15, offset: 0, forward
- end of stdout/stderr
  Instructions:                         404|340                  (+18.8235%) [+1.18824x]
  L1 Hits:                              575|492                  (+16.8699%) [+1.16870x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|10                   (+60.0000%) [+1.60000x]
  Total read+write:                     593|504                  (+17.6587%) [+1.17659x]
  Estimated Cycles:                    1145|852                  (+34.3897%) [+1.34390x]
mem_icount::memmove::forward large_spread_1:setup_forward(Cfg { len : 32, spread : Large, off ...
bytes: 32, spread: 31, offset: 0, forward
- end of stdout/stderr
  Instructions:                         464|368                  (+26.0870%) [+1.26087x]
  L1 Hits:                              655|528                  (+24.0530%) [+1.24053x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|10                   (+60.0000%) [+1.60000x]
  Total read+write:                     673|540                  (+24.6296%) [+1.24630x]
  Estimated Cycles:                    1225|888                  (+37.9505%) [+1.37950x]
mem_icount::memmove::forward large_spread_2:setup_forward(Cfg { len : 64, spread : Large, off ...
bytes: 64, spread: 63, offset: 0, forward
- end of stdout/stderr
  Instructions:                         584|424                  (+37.7358%) [+1.37736x]
  L1 Hits:                              815|600                  (+35.8333%) [+1.35833x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|10                   (+60.0000%) [+1.60000x]
  Total read+write:                     833|612                  (+36.1111%) [+1.36111x]
  Estimated Cycles:                    1385|960                  (+44.2708%) [+1.44271x]
mem_icount::memmove::forward large_spread_3:setup_forward(Cfg { len : 512, spread : Large, off...
bytes: 512, spread: 511, offset: 0, forward
- end of stdout/stderr
  Instructions:                        2264|1208                 (+87.4172%) [+1.87417x]
  L1 Hits:                             3055|1608                 (+89.9876%) [+1.89988x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|10                   (+60.0000%) [+1.60000x]
  Total read+write:                    3073|1620                 (+89.6914%) [+1.89691x]
  Estimated Cycles:                    3625|1968                 (+84.1972%) [+1.84197x]
mem_icount::memmove::forward large_spread_4:setup_forward(Cfg { len : 4096, spread : Large, of...
bytes: 4096, spread: 4095, offset: 0, forward
- end of stdout/stderr
  Instructions:                       15704|7480                 (+109.947%) [+2.09947x]
  L1 Hits:                            20975|9672                 (+116.863%) [+2.16863x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|10                   (+60.0000%) [+1.60000x]
  Total read+write:                   20993|9684                 (+116.780%) [+2.16780x]
  Estimated Cycles:                   21545|10032                (+114.763%) [+2.14763x]
mem_icount::memmove::forward large_spread_5:setup_forward(Cfg { len : MEG1, spread : Large, of...
bytes: 1048576, spread: 1048575, offset: 0, forward
- end of stdout/stderr
  Instructions:                     3932410|1835226              (+114.274%) [+2.14274x]
  L1 Hits:                          5210698|2327074              (+123.916%) [+2.23916x]
  L2 Hits:                            32523|32524                (-0.00307%) [-1.00003x]
  RAM Hits:                              19|13                   (+46.1538%) [+1.46154x]
  Total read+write:                 5243240|2359611              (+122.208%) [+2.22208x]
  Estimated Cycles:                 5373978|2490149              (+115.809%) [+2.15809x]
mem_icount::memmove::forward aligned_off_0:setup_forward(Cfg { len : 4096, spread : Aligned, ...
bytes: 4096, spread: 512, offset: 65, forward
- end of stdout/stderr
  Instructions:                        4407|4404                 (+0.06812%) [+1.00068x]
  L1 Hits:                             6600|6596                 (+0.06064%) [+1.00061x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              10|9                    (+11.1111%) [+1.11111x]
  Total read+write:                    6612|6607                 (+0.07568%) [+1.00076x]
  Estimated Cycles:                    6960|6921                 (+0.56350%) [+1.00564x]
mem_icount::memmove::forward aligned_off_1:setup_forward(Cfg { len : MEG1, spread : Aligned, ...
bytes: 1048576, spread: 512, offset: 65, forward
- end of stdout/stderr
  Instructions:                     1048793|1048790              (+0.00029%) [+1.00000x]
  L1 Hits:                          1557263|1557259              (+0.00026%) [+1.00000x]
  L2 Hits:                            15903|15903                (No change)
  RAM Hits:                              13|12                   (+8.33333%) [+1.08333x]
  Total read+write:                 1573179|1573174              (+0.00032%) [+1.00000x]
  Estimated Cycles:                 1637233|1637194              (+0.00238%) [+1.00002x]
mem_icount::memmove::forward small_spread_off_0:setup_forward(Cfg { len : 16, spread : Small, off ...
bytes: 16, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                         406|340                  (+19.4118%) [+1.19412x]
  L1 Hits:                              578|492                  (+17.4797%) [+1.17480x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     598|504                  (+18.6508%) [+1.18651x]
  Estimated Cycles:                    1218|852                  (+42.9577%) [+1.42958x]
mem_icount::memmove::forward small_spread_off_1:setup_forward(Cfg { len : 32, spread : Small, off ...
bytes: 32, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                         466|368                  (+26.6304%) [+1.26630x]
  L1 Hits:                              658|528                  (+24.6212%) [+1.24621x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     678|540                  (+25.5556%) [+1.25556x]
  Estimated Cycles:                    1298|888                  (+46.1712%) [+1.46171x]
mem_icount::memmove::forward small_spread_off_2:setup_forward(Cfg { len : 64, spread : Small, off ...
bytes: 64, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                         586|424                  (+38.2075%) [+1.38208x]
  L1 Hits:                              818|600                  (+36.3333%) [+1.36333x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     838|612                  (+36.9281%) [+1.36928x]
  Estimated Cycles:                    1458|960                  (+51.8750%) [+1.51875x]
mem_icount::memmove::forward small_spread_off_3:setup_forward(Cfg { len : 512, spread : Small, off...
bytes: 512, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                        2266|1208                 (+87.5828%) [+1.87583x]
  L1 Hits:                             3058|1608                 (+90.1741%) [+1.90174x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                    3078|1620                 (+90.0000%) [+1.90000x]
  Estimated Cycles:                    3698|1968                 (+87.9065%) [+1.87907x]
mem_icount::memmove::forward small_spread_off_4:setup_forward(Cfg { len : 4096, spread : Small, of...
bytes: 4096, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                       15706|7480                 (+109.973%) [+2.09973x]
  L1 Hits:                            20978|9672                 (+116.894%) [+2.16894x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                   20998|9684                 (+116.832%) [+2.16832x]
  Estimated Cycles:                   21618|10032                (+115.490%) [+2.15490x]
mem_icount::memmove::forward small_spread_off_5:setup_forward(Cfg { len : MEG1, spread : Small, of...
bytes: 1048576, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                     3932412|1835226              (+114.274%) [+2.14274x]
  L1 Hits:                          5227338|2343711              (+123.037%) [+2.23037x]
  L2 Hits:                            15886|15887                (-0.00629%) [-1.00006x]
  RAM Hits:                              21|13                   (+61.5385%) [+1.61538x]
  Total read+write:                 5243245|2359611              (+122.208%) [+2.22208x]
  Estimated Cycles:                 5307503|2423601              (+118.992%) [+2.18992x]
mem_icount::memmove::forward medium_spread_off_0:setup_forward(Cfg { len : 16, spread : Medium, off...
bytes: 16, spread: 9, offset: 65, forward
- end of stdout/stderr
  Instructions:                         406|340                  (+19.4118%) [+1.19412x]
  L1 Hits:                              578|492                  (+17.4797%) [+1.17480x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     598|504                  (+18.6508%) [+1.18651x]
  Estimated Cycles:                    1218|852                  (+42.9577%) [+1.42958x]
mem_icount::memmove::forward medium_spread_off_1:setup_forward(Cfg { len : 32, spread : Medium, off...
bytes: 32, spread: 17, offset: 65, forward
- end of stdout/stderr
  Instructions:                         466|368                  (+26.6304%) [+1.26630x]
  L1 Hits:                              658|528                  (+24.6212%) [+1.24621x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     678|540                  (+25.5556%) [+1.25556x]
  Estimated Cycles:                    1298|888                  (+46.1712%) [+1.46171x]
mem_icount::memmove::forward medium_spread_off_2:setup_forward(Cfg { len : 64, spread : Medium, off...
bytes: 64, spread: 33, offset: 65, forward
- end of stdout/stderr
  Instructions:                         586|424                  (+38.2075%) [+1.38208x]
  L1 Hits:                              818|600                  (+36.3333%) [+1.36333x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     838|612                  (+36.9281%) [+1.36928x]
  Estimated Cycles:                    1458|960                  (+51.8750%) [+1.51875x]
mem_icount::memmove::forward medium_spread_off_3:setup_forward(Cfg { len : 512, spread : Medium, of...
bytes: 512, spread: 257, offset: 65, forward
- end of stdout/stderr
  Instructions:                        2266|1208                 (+87.5828%) [+1.87583x]
  L1 Hits:                             3058|1608                 (+90.1741%) [+1.90174x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                    3078|1620                 (+90.0000%) [+1.90000x]
  Estimated Cycles:                    3698|1968                 (+87.9065%) [+1.87907x]
mem_icount::memmove::forward medium_spread_off_4:setup_forward(Cfg { len : 4096, spread : Medium, o...
bytes: 4096, spread: 2049, offset: 65, forward
- end of stdout/stderr
  Instructions:                       15706|7480                 (+109.973%) [+2.09973x]
  L1 Hits:                            20978|9672                 (+116.894%) [+2.16894x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                   20998|9684                 (+116.832%) [+2.16832x]
  Estimated Cycles:                   21618|10032                (+115.490%) [+2.15490x]
mem_icount::memmove::forward medium_spread_off_5:setup_forward(Cfg { len : MEG1, spread : Medium, o...
bytes: 1048576, spread: 524289, offset: 65, forward
- end of stdout/stderr
  Instructions:                     3932412|1835226              (+114.274%) [+2.14274x]
  L1 Hits:                          5210700|2327073              (+123.916%) [+2.23916x]
  L2 Hits:                            32524|32525                (-0.00307%) [-1.00003x]
  RAM Hits:                              21|13                   (+61.5385%) [+1.61538x]
  Total read+write:                 5243245|2359611              (+122.208%) [+2.22208x]
  Estimated Cycles:                 5374055|2490153              (+115.812%) [+2.15812x]
mem_icount::memmove::forward large_spread_off_0:setup_forward(Cfg { len : 16, spread : Large, off ...
bytes: 16, spread: 15, offset: 65, forward
- end of stdout/stderr
  Instructions:                         399|327                  (+22.0183%) [+1.22018x]
  L1 Hits:                              568|473                  (+20.0846%) [+1.20085x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|10                   (+50.0000%) [+1.50000x]
  Total read+write:                     585|485                  (+20.6186%) [+1.20619x]
  Estimated Cycles:                    1103|833                  (+32.4130%) [+1.32413x]
mem_icount::memmove::forward large_spread_off_1:setup_forward(Cfg { len : 32, spread : Large, off ...
bytes: 32, spread: 31, offset: 65, forward
- end of stdout/stderr
  Instructions:                         459|355                  (+29.2958%) [+1.29296x]
  L1 Hits:                              648|509                  (+27.3084%) [+1.27308x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|10                   (+50.0000%) [+1.50000x]
  Total read+write:                     665|521                  (+27.6392%) [+1.27639x]
  Estimated Cycles:                    1183|869                  (+36.1335%) [+1.36133x]
mem_icount::memmove::forward large_spread_off_2:setup_forward(Cfg { len : 64, spread : Large, off ...
bytes: 64, spread: 63, offset: 65, forward
- end of stdout/stderr
  Instructions:                         579|411                  (+40.8759%) [+1.40876x]
  L1 Hits:                              808|581                  (+39.0706%) [+1.39071x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|10                   (+50.0000%) [+1.50000x]
  Total read+write:                     825|593                  (+39.1231%) [+1.39123x]
  Estimated Cycles:                    1343|941                  (+42.7205%) [+1.42721x]
mem_icount::memmove::forward large_spread_off_3:setup_forward(Cfg { len : 512, spread : Large, off...
bytes: 512, spread: 511, offset: 65, forward
- end of stdout/stderr
  Instructions:                        2259|1195                 (+89.0377%) [+1.89038x]
  L1 Hits:                             3048|1589                 (+91.8188%) [+1.91819x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|10                   (+50.0000%) [+1.50000x]
  Total read+write:                    3065|1601                 (+91.4428%) [+1.91443x]
  Estimated Cycles:                    3583|1949                 (+83.8379%) [+1.83838x]
mem_icount::memmove::forward large_spread_off_4:setup_forward(Cfg { len : 4096, spread : Large, of...
bytes: 4096, spread: 4095, offset: 65, forward
- end of stdout/stderr
  Instructions:                       15699|7467                 (+110.245%) [+2.10245x]
  L1 Hits:                            20968|9653                 (+117.217%) [+2.17217x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|10                   (+50.0000%) [+1.50000x]
  Total read+write:                   20985|9665                 (+117.124%) [+2.17124x]
  Estimated Cycles:                   21503|10013                (+114.751%) [+2.14751x]
mem_icount::memmove::forward large_spread_off_5:setup_forward(Cfg { len : MEG1, spread : Large, of...
bytes: 1048576, spread: 1048575, offset: 65, forward
- end of stdout/stderr
  Instructions:                     3932405|1835213              (+114.275%) [+2.14275x]
  L1 Hits:                          5210692|2327056              (+123.918%) [+2.23918x]
  L2 Hits:                            32522|32523                (-0.00307%) [-1.00003x]
  RAM Hits:                              18|13                   (+38.4615%) [+1.38462x]
  Total read+write:                 5243232|2359592              (+122.209%) [+2.22209x]
  Estimated Cycles:                 5373932|2490126              (+115.810%) [+2.15810x]
mem_icount::memmove::backward aligned_0:setup_backward(Cfg { len : 4096, spread : Aligned,...
bytes: 4096, spread: 512, offset: 0, backward
- end of stdout/stderr
  Instructions:                        4388|4386                 (+0.04560%) [+1.00046x]
  L1 Hits:                             6579|6576                 (+0.04562%) [+1.00046x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              10|9                    (+11.1111%) [+1.11111x]
  Total read+write:                    6591|6587                 (+0.06073%) [+1.00061x]
  Estimated Cycles:                    6939|6901                 (+0.55064%) [+1.00551x]
mem_icount::memmove::backward aligned_1:setup_backward(Cfg { len : MEG1, spread : Aligned,...
bytes: 1048576, spread: 512, offset: 0, backward
- end of stdout/stderr
  Instructions:                     1048774|1048772              (+0.00019%) [+1.00000x]
  L1 Hits:                          1556742|1556739              (+0.00019%) [+1.00000x]
  L2 Hits:                            16403|16403                (No change)
  RAM Hits:                              13|12                   (+8.33333%) [+1.08333x]
  Total read+write:                 1573158|1573154              (+0.00025%) [+1.00000x]
  Estimated Cycles:                 1639212|1639174              (+0.00232%) [+1.00002x]
mem_icount::memmove::backward small_spread_0:setup_backward(Cfg { len : 16, spread : Small, off...
bytes: 16, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                         370|326                  (+13.4969%) [+1.13497x]
  L1 Hits:                              524|475                  (+10.3158%) [+1.10316x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     542|488                  (+11.0656%) [+1.11066x]
  Estimated Cycles:                    1094|870                  (+25.7471%) [+1.25747x]
mem_icount::memmove::backward small_spread_1:setup_backward(Cfg { len : 32, spread : Small, off...
bytes: 32, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                         406|354                  (+14.6893%) [+1.14689x]
  L1 Hits:                              568|511                  (+11.1546%) [+1.11155x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     586|524                  (+11.8321%) [+1.11832x]
  Estimated Cycles:                    1138|906                  (+25.6071%) [+1.25607x]
mem_icount::memmove::backward small_spread_2:setup_backward(Cfg { len : 64, spread : Small, off...
bytes: 64, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                         478|410                  (+16.5854%) [+1.16585x]
  L1 Hits:                              656|583                  (+12.5214%) [+1.12521x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     674|596                  (+13.0872%) [+1.13087x]
  Estimated Cycles:                    1226|978                  (+25.3579%) [+1.25358x]
mem_icount::memmove::backward small_spread_3:setup_backward(Cfg { len : 512, spread : Small, of...
bytes: 512, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                        1486|1194                 (+24.4556%) [+1.24456x]
  L1 Hits:                             1888|1591                 (+18.6675%) [+1.18668x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                    1906|1604                 (+18.8279%) [+1.18828x]
  Estimated Cycles:                    2458|1986                 (+23.7664%) [+1.23766x]
mem_icount::memmove::backward small_spread_4:setup_backward(Cfg { len : 4096, spread : Small, o...
bytes: 4096, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                        9550|7466                 (+27.9132%) [+1.27913x]
  L1 Hits:                            11744|9655                 (+21.6365%) [+1.21636x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                   11762|9668                 (+21.6591%) [+1.21659x]
  Estimated Cycles:                   12314|10050                (+22.5274%) [+1.22527x]
mem_icount::memmove::backward small_spread_5:setup_backward(Cfg { len : MEG1, spread : Small, o...
bytes: 1048576, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                     2359536|1835212              (+28.5702%) [+1.28570x]
  L1 Hits:                          2867514|2343185              (+22.3768%) [+1.22377x]
  L2 Hits:                            16396|16396                (No change)
  RAM Hits:                              19|14                   (+35.7143%) [+1.35714x]
  Total read+write:                 2883929|2359595              (+22.2214%) [+1.22221x]
  Estimated Cycles:                 2950159|2425655              (+21.6232%) [+1.21623x]
mem_icount::memmove::backward medium_spread_0:setup_backward(Cfg { len : 16, spread : Medium, of...
bytes: 16, spread: 9, offset: 0, backward
- end of stdout/stderr
  Instructions:                         370|326                  (+13.4969%) [+1.13497x]
  L1 Hits:                              524|475                  (+10.3158%) [+1.10316x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     542|488                  (+11.0656%) [+1.11066x]
  Estimated Cycles:                    1094|870                  (+25.7471%) [+1.25747x]
mem_icount::memmove::backward medium_spread_1:setup_backward(Cfg { len : 32, spread : Medium, of...
bytes: 32, spread: 17, offset: 0, backward
- end of stdout/stderr
  Instructions:                         406|354                  (+14.6893%) [+1.14689x]
  L1 Hits:                              568|511                  (+11.1546%) [+1.11155x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     586|524                  (+11.8321%) [+1.11832x]
  Estimated Cycles:                    1138|906                  (+25.6071%) [+1.25607x]
mem_icount::memmove::backward medium_spread_2:setup_backward(Cfg { len : 64, spread : Medium, of...
bytes: 64, spread: 33, offset: 0, backward
- end of stdout/stderr
  Instructions:                         478|410                  (+16.5854%) [+1.16585x]
  L1 Hits:                              656|583                  (+12.5214%) [+1.12521x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     674|596                  (+13.0872%) [+1.13087x]
  Estimated Cycles:                    1226|978                  (+25.3579%) [+1.25358x]
mem_icount::memmove::backward medium_spread_3:setup_backward(Cfg { len : 512, spread : Medium, o...
bytes: 512, spread: 257, offset: 0, backward
- end of stdout/stderr
  Instructions:                        1486|1194                 (+24.4556%) [+1.24456x]
  L1 Hits:                             1888|1591                 (+18.6675%) [+1.18668x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                    1906|1604                 (+18.8279%) [+1.18828x]
  Estimated Cycles:                    2458|1986                 (+23.7664%) [+1.23766x]
mem_icount::memmove::backward medium_spread_4:setup_backward(Cfg { len : 4096, spread : Medium, ...
bytes: 4096, spread: 2049, offset: 0, backward
- end of stdout/stderr
  Instructions:                        9550|7466                 (+27.9132%) [+1.27913x]
  L1 Hits:                            11744|9655                 (+21.6365%) [+1.21636x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                   11762|9668                 (+21.6591%) [+1.21659x]
  Estimated Cycles:                   12314|10050                (+22.5274%) [+1.22527x]
mem_icount::memmove::backward medium_spread_5:setup_backward(Cfg { len : MEG1, spread : Medium, ...
bytes: 1048576, spread: 524289, offset: 0, backward
- end of stdout/stderr
  Instructions:                     2359536|1835212              (+28.5702%) [+1.28570x]
  L1 Hits:                          2851130|2326801              (+22.5343%) [+1.22534x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                              19|14                   (+35.7143%) [+1.35714x]
  Total read+write:                 2883929|2359595              (+22.2214%) [+1.22221x]
  Estimated Cycles:                 3015695|2491191              (+21.0543%) [+1.21054x]
mem_icount::memmove::backward large_spread_0:setup_backward(Cfg { len : 16, spread : Large, off...
bytes: 16, spread: 15, offset: 0, backward
- end of stdout/stderr
  Instructions:                         369|326                  (+13.1902%) [+1.13190x]
  L1 Hits:                              526|475                  (+10.7368%) [+1.10737x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     544|488                  (+11.4754%) [+1.11475x]
  Estimated Cycles:                    1096|870                  (+25.9770%) [+1.25977x]
mem_icount::memmove::backward large_spread_1:setup_backward(Cfg { len : 32, spread : Large, off...
bytes: 32, spread: 31, offset: 0, backward
- end of stdout/stderr
  Instructions:                         405|354                  (+14.4068%) [+1.14407x]
  L1 Hits:                              570|511                  (+11.5460%) [+1.11546x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     588|524                  (+12.2137%) [+1.12214x]
  Estimated Cycles:                    1140|906                  (+25.8278%) [+1.25828x]
mem_icount::memmove::backward large_spread_2:setup_backward(Cfg { len : 64, spread : Large, off...
bytes: 64, spread: 63, offset: 0, backward
- end of stdout/stderr
  Instructions:                         477|410                  (+16.3415%) [+1.16341x]
  L1 Hits:                              658|583                  (+12.8645%) [+1.12864x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     676|596                  (+13.4228%) [+1.13423x]
  Estimated Cycles:                    1228|978                  (+25.5624%) [+1.25562x]
mem_icount::memmove::backward large_spread_3:setup_backward(Cfg { len : 512, spread : Large, of...
bytes: 512, spread: 511, offset: 0, backward
- end of stdout/stderr
  Instructions:                        1485|1194                 (+24.3719%) [+1.24372x]
  L1 Hits:                             1890|1591                 (+18.7932%) [+1.18793x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                    1908|1604                 (+18.9526%) [+1.18953x]
  Estimated Cycles:                    2460|1986                 (+23.8671%) [+1.23867x]
mem_icount::memmove::backward large_spread_4:setup_backward(Cfg { len : 4096, spread : Large, o...
bytes: 4096, spread: 4095, offset: 0, backward
- end of stdout/stderr
  Instructions:                        9549|7466                 (+27.8998%) [+1.27900x]
  L1 Hits:                            11746|9655                 (+21.6572%) [+1.21657x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                   11764|9668                 (+21.6798%) [+1.21680x]
  Estimated Cycles:                   12316|10050                (+22.5473%) [+1.22547x]
mem_icount::memmove::backward large_spread_5:setup_backward(Cfg { len : MEG1, spread : Large, o...
bytes: 1048576, spread: 1048575, offset: 0, backward
- end of stdout/stderr
  Instructions:                     2359535|1835212              (+28.5702%) [+1.28570x]
  L1 Hits:                          2851132|2326801              (+22.5344%) [+1.22534x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                              19|14                   (+35.7143%) [+1.35714x]
  Total read+write:                 2883931|2359595              (+22.2214%) [+1.22221x]
  Estimated Cycles:                 3015697|2491191              (+21.0544%) [+1.21054x]
mem_icount::memmove::backward aligned_off_0:setup_backward(Cfg { len : 4096, spread : Aligned,...
bytes: 4096, spread: 512, offset: 65, backward
- end of stdout/stderr
  Instructions:                        4402|4400                 (+0.04545%) [+1.00045x]
  L1 Hits:                             6599|6596                 (+0.04548%) [+1.00045x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              10|9                    (+11.1111%) [+1.11111x]
  Total read+write:                    6611|6607                 (+0.06054%) [+1.00061x]
  Estimated Cycles:                    6959|6921                 (+0.54905%) [+1.00549x]
mem_icount::memmove::backward aligned_off_1:setup_backward(Cfg { len : MEG1, spread : Aligned,...
bytes: 1048576, spread: 512, offset: 65, backward
- end of stdout/stderr
  Instructions:                     1048788|1048786              (+0.00019%) [+1.00000x]
  L1 Hits:                          1556761|1556758              (+0.00019%) [+1.00000x]
  L2 Hits:                            16404|16404                (No change)
  RAM Hits:                              13|12                   (+8.33333%) [+1.08333x]
  Total read+write:                 1573178|1573174              (+0.00025%) [+1.00000x]
  Estimated Cycles:                 1639236|1639198              (+0.00232%) [+1.00002x]
mem_icount::memmove::backward small_spread_off_0:setup_backward(Cfg { len : 16, spread : Small, off...
bytes: 16, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                         379|337                  (+12.4629%) [+1.12463x]
  L1 Hits:                              539|492                  (+9.55285%) [+1.09553x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     557|505                  (+10.2970%) [+1.10297x]
  Estimated Cycles:                    1109|887                  (+25.0282%) [+1.25028x]
mem_icount::memmove::backward small_spread_off_1:setup_backward(Cfg { len : 32, spread : Small, off...
bytes: 32, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                         415|365                  (+13.6986%) [+1.13699x]
  L1 Hits:                              583|528                  (+10.4167%) [+1.10417x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     601|541                  (+11.0906%) [+1.11091x]
  Estimated Cycles:                    1153|923                  (+24.9187%) [+1.24919x]
mem_icount::memmove::backward small_spread_off_2:setup_backward(Cfg { len : 64, spread : Small, off...
bytes: 64, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                         487|421                  (+15.6770%) [+1.15677x]
  L1 Hits:                              671|600                  (+11.8333%) [+1.11833x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     689|613                  (+12.3980%) [+1.12398x]
  Estimated Cycles:                    1241|995                  (+24.7236%) [+1.24724x]
mem_icount::memmove::backward small_spread_off_3:setup_backward(Cfg { len : 512, spread : Small, of...
bytes: 512, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                        1495|1205                 (+24.0664%) [+1.24066x]
  L1 Hits:                             1903|1608                 (+18.3458%) [+1.18346x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                    1921|1621                 (+18.5071%) [+1.18507x]
  Estimated Cycles:                    2473|2003                 (+23.4648%) [+1.23465x]
mem_icount::memmove::backward small_spread_off_4:setup_backward(Cfg { len : 4096, spread : Small, o...
bytes: 4096, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                        9559|7477                 (+27.8454%) [+1.27845x]
  L1 Hits:                            11759|9672                 (+21.5778%) [+1.21578x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                   11777|9685                 (+21.6004%) [+1.21600x]
  Estimated Cycles:                   12329|10067                (+22.4695%) [+1.22469x]
mem_icount::memmove::backward small_spread_off_5:setup_backward(Cfg { len : MEG1, spread : Small, o...
bytes: 1048576, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                     2359545|1835223              (+28.5699%) [+1.28570x]
  L1 Hits:                          2867529|2343202              (+22.3765%) [+1.22377x]
  L2 Hits:                            16396|16396                (No change)
  RAM Hits:                              19|14                   (+35.7143%) [+1.35714x]
  Total read+write:                 2883944|2359612              (+22.2211%) [+1.22221x]
  Estimated Cycles:                 2950174|2425672              (+21.6230%) [+1.21623x]
mem_icount::memmove::backward medium_spread_off_0:setup_backward(Cfg { len : 16, spread : Medium, of...
bytes: 16, spread: 9, offset: 65, backward
- end of stdout/stderr
  Instructions:                         379|337                  (+12.4629%) [+1.12463x]
  L1 Hits:                              539|492                  (+9.55285%) [+1.09553x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     557|505                  (+10.2970%) [+1.10297x]
  Estimated Cycles:                    1109|887                  (+25.0282%) [+1.25028x]
mem_icount::memmove::backward medium_spread_off_1:setup_backward(Cfg { len : 32, spread : Medium, of...
bytes: 32, spread: 17, offset: 65, backward
- end of stdout/stderr
  Instructions:                         415|365                  (+13.6986%) [+1.13699x]
  L1 Hits:                              583|528                  (+10.4167%) [+1.10417x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     601|541                  (+11.0906%) [+1.11091x]
  Estimated Cycles:                    1153|923                  (+24.9187%) [+1.24919x]
mem_icount::memmove::backward medium_spread_off_2:setup_backward(Cfg { len : 64, spread : Medium, of...
bytes: 64, spread: 33, offset: 65, backward
- end of stdout/stderr
  Instructions:                         487|421                  (+15.6770%) [+1.15677x]
  L1 Hits:                              671|600                  (+11.8333%) [+1.11833x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     689|613                  (+12.3980%) [+1.12398x]
  Estimated Cycles:                    1241|995                  (+24.7236%) [+1.24724x]
mem_icount::memmove::backward medium_spread_off_3:setup_backward(Cfg { len : 512, spread : Medium, o...
bytes: 512, spread: 257, offset: 65, backward
- end of stdout/stderr
  Instructions:                        1495|1205                 (+24.0664%) [+1.24066x]
  L1 Hits:                             1903|1608                 (+18.3458%) [+1.18346x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                    1921|1621                 (+18.5071%) [+1.18507x]
  Estimated Cycles:                    2473|2003                 (+23.4648%) [+1.23465x]
mem_icount::memmove::backward medium_spread_off_4:setup_backward(Cfg { len : 4096, spread : Medium, ...
bytes: 4096, spread: 2049, offset: 65, backward
- end of stdout/stderr
  Instructions:                        9559|7477                 (+27.8454%) [+1.27845x]
  L1 Hits:                            11759|9672                 (+21.5778%) [+1.21578x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                   11777|9685                 (+21.6004%) [+1.21600x]
  Estimated Cycles:                   12329|10067                (+22.4695%) [+1.22469x]
mem_icount::memmove::backward medium_spread_off_5:setup_backward(Cfg { len : MEG1, spread : Medium, ...
bytes: 1048576, spread: 524289, offset: 65, backward
- end of stdout/stderr
  Instructions:                     2359545|1835223              (+28.5699%) [+1.28570x]
  L1 Hits:                          2851144|2326817              (+22.5341%) [+1.22534x]
  L2 Hits:                            32781|32781                (No change)
  RAM Hits:                              19|14                   (+35.7143%) [+1.35714x]
  Total read+write:                 2883944|2359612              (+22.2211%) [+1.22221x]
  Estimated Cycles:                 3015714|2491212              (+21.0541%) [+1.21054x]
mem_icount::memmove::backward large_spread_off_0:setup_backward(Cfg { len : 16, spread : Large, off...
bytes: 16, spread: 15, offset: 65, backward
- end of stdout/stderr
  Instructions:                         378|337                  (+12.1662%) [+1.12166x]
  L1 Hits:                              541|492                  (+9.95935%) [+1.09959x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     559|505                  (+10.6931%) [+1.10693x]
  Estimated Cycles:                    1111|887                  (+25.2537%) [+1.25254x]
mem_icount::memmove::backward large_spread_off_1:setup_backward(Cfg { len : 32, spread : Large, off...
bytes: 32, spread: 31, offset: 65, backward
- end of stdout/stderr
  Instructions:                         414|365                  (+13.4247%) [+1.13425x]
  L1 Hits:                              585|528                  (+10.7955%) [+1.10795x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     603|541                  (+11.4603%) [+1.11460x]
  Estimated Cycles:                    1155|923                  (+25.1354%) [+1.25135x]
mem_icount::memmove::backward large_spread_off_2:setup_backward(Cfg { len : 64, spread : Large, off...
bytes: 64, spread: 63, offset: 65, backward
- end of stdout/stderr
  Instructions:                         486|421                  (+15.4394%) [+1.15439x]
  L1 Hits:                              673|600                  (+12.1667%) [+1.12167x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     691|613                  (+12.7243%) [+1.12724x]
  Estimated Cycles:                    1243|995                  (+24.9246%) [+1.24925x]
mem_icount::memmove::backward large_spread_off_3:setup_backward(Cfg { len : 512, spread : Large, of...
bytes: 512, spread: 511, offset: 65, backward
- end of stdout/stderr
  Instructions:                        1494|1205                 (+23.9834%) [+1.23983x]
  L1 Hits:                             1905|1608                 (+18.4701%) [+1.18470x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                    1923|1621                 (+18.6305%) [+1.18630x]
  Estimated Cycles:                    2475|2003                 (+23.5647%) [+1.23565x]
mem_icount::memmove::backward large_spread_off_4:setup_backward(Cfg { len : 4096, spread : Large, o...
bytes: 4096, spread: 4095, offset: 65, backward
- end of stdout/stderr
  Instructions:                        9558|7477                 (+27.8320%) [+1.27832x]
  L1 Hits:                            11761|9672                 (+21.5984%) [+1.21598x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                   11779|9685                 (+21.6211%) [+1.21621x]
  Estimated Cycles:                   12331|10067                (+22.4893%) [+1.22489x]
mem_icount::memmove::backward large_spread_off_5:setup_backward(Cfg { len : MEG1, spread : Large, o...
bytes: 1048576, spread: 1048575, offset: 65, backward
- end of stdout/stderr
  Instructions:                     2359544|1835223              (+28.5699%) [+1.28570x]
  L1 Hits:                          2851147|2326818              (+22.5342%) [+1.22534x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                              19|14                   (+35.7143%) [+1.35714x]
  Total read+write:                 2883946|2359612              (+22.2212%) [+1.22221x]
  Estimated Cycles:                 3015712|2491208              (+21.0542%) [+1.21054x]

@tgross35 tgross35 enabled auto-merge (squash) March 22, 2025 05:32
@tgross35 tgross35 merged commit 4df7a8d into rust-lang:master Mar 22, 2025
27 checks passed
@RalfJung
Copy link
Member Author

RalfJung commented Mar 22, 2025

memcpy has about a 25% slowdown for the alignment mismatch tests. memmove has about the same slowdown but it slows down more significantly for larger copies. Worse case results:

That is strange, for larger tests the slowdown should trail off to 0, no? The core "hot" loop is exactly the same as before.

Should I submit a PR that copies the prefix/postfix bytewise instead of via a 1-byte-load and a 2-byte-load, just so that we can get those numbers for comparison?

@tgross35
Copy link
Contributor

Maybe they were tapering at different rates so the delta appears larger at larger copy lengths? Though 4k and 1M are kind of large to be seeing that behavior.

I don't think we really need to do anything here but if you have something in mind, I'm happy to test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Several functions perform out-of-bounds memory accesses (which is UB)
3 participants