copy_misaligned_words: avoid out-of-bounds accesses #799

RalfJung · 2025-03-18T20:07:32Z

Fixes #559 for memmove/memcpy: load the underaligned prefix and suffix in copy_*_misaligned_words in up to 3 separate aligned loads (a 1-byte load, a 2-byte load, and for 64bit targets a 4-byte load), while only doing those loads that are actually inbounds. The hope is that the performance loss compared to a single aligned ptr-sized load is negligible.

I confirmed that this now passes Miri (the second of these already worked before this PR):

# target without mem-unaligned
MIRIFLAGS=-Zmiri-tree-borrows cargo miri test --features no-asm --target armv7-unknown-linux-gnueabihf -- align
# target with mem-unaligned
MIRIFLAGS=-Zmiri-tree-borrows cargo miri test --features no-asm --target x86_64-unknown-linux-gnu -- align

I added a new test since the existing test had some slack space around the memory being copied, making all accesses accidentally inbounds (but Miri was still helpful to confirm everything is aligned). This test found a bug in my code, fixed in the second commit. :D

This also add those above commands to CI so hopefully this crate still stay green for Miri. :)

tgross35 · 2025-03-18T21:02:51Z

Is there some good place in the CI config to add this Miri check? Note that I am only running some of the tests (those with align in their name) as otherwise this will take ~forever; some tests have large iteration counts. We need Tree Borrows since the test suite has the as_ptr+as_mut_ptr pattern that is not compatible with Stacked Borrows.

I've been meaning to ask about this, it sounds like a great idea to me. You can just add a new main.yml CI job, probably with these bits

compiler-builtins/.github/workflows/main.yml

Lines 107 to 115 in 571ce5f

    
           - uses: actions/checkout@v4 
        
             with: 
        
               submodules: true 
        
           - name: Install Rust (rustup) 
        
             run: rustup update ${{ matrix.rust }} --no-self-update && rustup default ${{ matrix.rust }} 
        
             shell: bash 
        
           - run: rustup target add ${{ matrix.target }} 
        
           - run: rustup component add llvm-tools-preview 
        
           - uses: Swatinem/rust-cache@v2

then put the rest in a script.

The float tests can probably be skipped since that module has no unsafe (we might even be able to forbid it) and it's probably quite slow to run.

RalfJung · 2025-03-18T21:07:48Z

Even mem has very slow tests like this one. That's why I only ran the align tests. Though maybe it'd be worth reducing those constants in Miri so more tests can run. I don't want to go over the entire test suite though, that sounds like a lot of work. ;)

RalfJung · 2025-03-18T22:08:10Z

src/mem/impls.rs

            dest_usize = dest_usize.wrapping_add(1);
        }
+
+        // There's one more element left to go, and we can't use the loop for that as on the `src` side,
+        // it is partially out-of-bounds.


The code previously seemed unaware that there can also be OOB accesses at the end of the range -- but of course that's fundamentally the same problem as at the beginning.

RalfJung · 2025-03-18T22:17:15Z

Miri is looking good on CI :)

tgross35

Some surface-level notes, I'll take a closer look at perf soonish

.github/workflows/main.yml

ci/miri.sh

src/mem/impls.rs

tgross35 · 2025-03-19T10:07:02Z

Unfortunately it looks like this comes close to doubling the total line and label counts of this routine https://godbolt.org/z/7WYa6e83n. I agree that the UB is worth fixing even at a performance hit, but I have to imagine this could be improved with massaging.

@nbdd0121 I know it has been a long time since you worked on #405 but do you have any ideas on how to improve the codegen here without OOB access?

(I haven't actually tested so it is possible visual asm heuristics don't accurately reflect runtime, but the end blocks are definitely larger)

RalfJung · 2025-03-19T13:03:38Z

Yeah there's prefix and postfix handling now which of course adds some extra code and labels. The original code neglected to treat the last loop iteration differently which makes this not a fully fair comparison (at the very least, we should compare with a version that uses an atomic/volatile load for the last round, as that one can also be OOB).

The code size could be reduced by using copy_forward_bytes instead of load_aligned_partial/load_aligned_end_partial. But I would expect that to be worse for performance...

This compares the "original but with the final loop iteration unrolled" with the copy_forward_bytes variant: https://godbolt.org/z/768YPGqGG. Still an increase, but "only" by 60%.

nbdd0121 · 2025-03-19T13:38:41Z

Technically last iteration of loop doesn't special handling if use use unordered atomic load for each loop iteration. The codegen shouldn't be massively different.

Using byte copy doesn't necessarily mean worse performance as it at most (on each end) performs 3 additional byte copies. But it also removes data-dependent branches which is hard to predict. This can also merged with the out-most byte-copy computation. I guess benchmarking would be necessary.

RalfJung · 2025-03-19T13:43:01Z

In my view, since this is almost certainly still faster than the code before #405, and that PR achieved its performance by having UB, this is still a win.

But I'd also be curious what the numbers actually look like. If someone has access to an ARM-32 system and could benchmark this, that would be great. :)

tgross35 · 2025-03-21T21:17:49Z

Just for an update, I'm trying pretty hard to get some form of benchmarks just so we have a reference, not that I think perf is really worth blocking on as long as it's not somehow awful. There are now instruction count benchmarks, you need iai-callgrind-runner (via Cargo) and Valgrind then can run cargo bench -p testcrate --bench mem_icount --features icount -- --nocapture. I'm reasonably close to either getting the icount benchmarks to run in qemu for armv7 or giving up at that.

Mind rebasing at some point to pick that up? (Sorry, I moved some things around hence the conflict)

RalfJung · 2025-03-21T21:21:03Z

Sure, the rebase went through without any manual work.

tgross35 · 2025-03-22T05:31:41Z

Gave up on qemu, but I had a 32-bit raspberry pi laying around :)

memcpy has about a 25% slowdown for the alignment mismatch tests. memmove has about the same slowdown but it slows down more significantly for larger copies. Worse case results:

mem_icount::memcpy::bench misaligned_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 66 })
bytes: 1048576 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                     2359657|1835335              (+28.5682%) [+1.28568x]
  L1 Hits:                          2851307|2326978              (+22.5326%) [+1.22533x]
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                              13|10                   (+30.0000%) [+1.30000x]
  Total read+write:                 2884102|2359770              (+22.2196%) [+1.22220x]
  Estimated Cycles:                 3015672|2491238              (+21.0511%) [+1.21051x]

mem_icount::memmove::forward large_spread_5:setup_forward(Cfg { len : MEG1, spread : Large, of...
bytes: 1048576, spread: 1048575, offset: 0, forward
- end of stdout/stderr
  Instructions:                     3932410|1835226              (+114.274%) [+2.14274x]
  L1 Hits:                          5210698|2327074              (+123.916%) [+2.23916x]
  L2 Hits:                            32523|32524                (-0.00307%) [-1.00003x]
  RAM Hits:                              19|13                   (+46.1538%) [+1.46154x]
  Total read+write:                 5243240|2359611              (+122.208%) [+2.22208x]
  Estimated Cycles:                 5373978|2490149              (+115.809%) [+2.15809x]

I don't expect that to make or break anyone's day, so let's get this UB fixed 🎉.

Full benchmark log

     Running benches/mem_icount.rs (target/release/deps/mem_icount-f7ca6dbcc87f37e2)
mem_icount::memcpy::bench aligned_0:setup(Cfg { len : 16, s_off : 0, d_off : 0 })
bytes: 16 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         519|517                  (+0.38685%) [+1.00387x]
  L1 Hits:                              779|775                  (+0.51613%) [+1.00516x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     784|780                  (+0.51282%) [+1.00513x]
  Estimated Cycles:                     954|950                  (+0.42105%) [+1.00421x]
mem_icount::memcpy::bench aligned_1:setup(Cfg { len : 32, s_off : 0, d_off : 0 })
bytes: 32 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         535|533                  (+0.37523%) [+1.00375x]
  L1 Hits:                              803|799                  (+0.50063%) [+1.00501x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     808|804                  (+0.49751%) [+1.00498x]
  Estimated Cycles:                     978|974                  (+0.41068%) [+1.00411x]
mem_icount::memcpy::bench aligned_2:setup(Cfg { len : 64, s_off : 0, d_off : 0 })
bytes: 64 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         567|565                  (+0.35398%) [+1.00354x]
  L1 Hits:                              851|847                  (+0.47226%) [+1.00472x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     856|852                  (+0.46948%) [+1.00469x]
  Estimated Cycles:                    1026|1022                 (+0.39139%) [+1.00391x]
mem_icount::memcpy::bench aligned_3:setup(Cfg { len : 512, s_off : 0, d_off : 0 })
bytes: 512 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                        1015|1013                 (+0.19743%) [+1.00197x]
  L1 Hits:                             1523|1519                 (+0.26333%) [+1.00263x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1528|1524                 (+0.26247%) [+1.00262x]
  Estimated Cycles:                    1698|1694                 (+0.23613%) [+1.00236x]
mem_icount::memcpy::bench aligned_4:setup(Cfg { len : 4096, s_off : 0, d_off : 0 })
bytes: 4096 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                        4599|4597                 (+0.04351%) [+1.00044x]
  L1 Hits:                             6899|6895                 (+0.05801%) [+1.00058x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    6904|6900                 (+0.05797%) [+1.00058x]
  Estimated Cycles:                    7074|7070                 (+0.05658%) [+1.00057x]
mem_icount::memcpy::bench aligned_5:setup(Cfg { len : MEG1, s_off : 0, d_off : 0 })
bytes: 1048576 bytes, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                     1048886|1048884              (+0.00019%) [+1.00000x]
  L1 Hits:                          1540527|1540523              (+0.00026%) [+1.00000x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 1573316|1573312              (+0.00025%) [+1.00000x]
  Estimated Cycles:                 1704742|1704738              (+0.00023%) [+1.00000x]
mem_icount::memcpy::bench offset_0:setup(Cfg { len : 16, s_off : 65, d_off : 65 })
bytes: 16 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         533|531                  (+0.37665%) [+1.00377x]
  L1 Hits:                              799|795                  (+0.50314%) [+1.00503x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     804|800                  (+0.50000%) [+1.00500x]
  Estimated Cycles:                     974|970                  (+0.41237%) [+1.00412x]
mem_icount::memcpy::bench offset_1:setup(Cfg { len : 32, s_off : 65, d_off : 65 })
bytes: 32 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         549|547                  (+0.36563%) [+1.00366x]
  L1 Hits:                              823|819                  (+0.48840%) [+1.00488x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     828|824                  (+0.48544%) [+1.00485x]
  Estimated Cycles:                     998|994                  (+0.40241%) [+1.00402x]
mem_icount::memcpy::bench offset_2:setup(Cfg { len : 64, s_off : 65, d_off : 65 })
bytes: 64 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         581|579                  (+0.34542%) [+1.00345x]
  L1 Hits:                              871|867                  (+0.46136%) [+1.00461x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     876|872                  (+0.45872%) [+1.00459x]
  Estimated Cycles:                    1046|1042                 (+0.38388%) [+1.00384x]
mem_icount::memcpy::bench offset_3:setup(Cfg { len : 512, s_off : 65, d_off : 65 })
bytes: 512 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                        1029|1027                 (+0.19474%) [+1.00195x]
  L1 Hits:                             1543|1539                 (+0.25991%) [+1.00260x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1548|1544                 (+0.25907%) [+1.00259x]
  Estimated Cycles:                    1718|1714                 (+0.23337%) [+1.00233x]
mem_icount::memcpy::bench offset_4:setup(Cfg { len : 4096, s_off : 65, d_off : 65 })
bytes: 4096 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                        4613|4611                 (+0.04337%) [+1.00043x]
  L1 Hits:                             6919|6915                 (+0.05785%) [+1.00058x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    6924|6920                 (+0.05780%) [+1.00058x]
  Estimated Cycles:                    7094|7090                 (+0.05642%) [+1.00056x]
mem_icount::memcpy::bench offset_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 65 })
bytes: 1048576 bytes, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                     1048900|1048898              (+0.00019%) [+1.00000x]
  L1 Hits:                          1540545|1540541              (+0.00026%) [+1.00000x]
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 1573336|1573332              (+0.00025%) [+1.00000x]
  Estimated Cycles:                 1704770|1704766              (+0.00023%) [+1.00000x]
mem_icount::memcpy::bench misaligned_0:setup(Cfg { len : 16, s_off : 65, d_off : 66 })
bytes: 16 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         590|548                  (+7.66423%) [+1.07664x]
  L1 Hits:                              861|812                  (+6.03448%) [+1.06034x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|6                    (+50.0000%) [+1.50000x]
  Total read+write:                     870|818                  (+6.35697%) [+1.06357x]
  Estimated Cycles:                    1176|1022                 (+15.0685%) [+1.15068x]
mem_icount::memcpy::bench misaligned_1:setup(Cfg { len : 32, s_off : 65, d_off : 66 })
bytes: 32 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         626|576                  (+8.68056%) [+1.08681x]
  L1 Hits:                              905|848                  (+6.72170%) [+1.06722x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|6                    (+50.0000%) [+1.50000x]
  Total read+write:                     914|854                  (+7.02576%) [+1.07026x]
  Estimated Cycles:                    1220|1058                 (+15.3119%) [+1.15312x]
mem_icount::memcpy::bench misaligned_2:setup(Cfg { len : 64, s_off : 65, d_off : 66 })
bytes: 64 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         698|632                  (+10.4430%) [+1.10443x]
  L1 Hits:                              993|920                  (+7.93478%) [+1.07935x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|6                    (+50.0000%) [+1.50000x]
  Total read+write:                    1002|926                  (+8.20734%) [+1.08207x]
  Estimated Cycles:                    1308|1130                 (+15.7522%) [+1.15752x]
mem_icount::memcpy::bench misaligned_3:setup(Cfg { len : 512, s_off : 65, d_off : 66 })
bytes: 512 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                        1706|1416                 (+20.4802%) [+1.20480x]
  L1 Hits:                             2225|1928                 (+15.4046%) [+1.15405x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|6                    (+50.0000%) [+1.50000x]
  Total read+write:                    2234|1934                 (+15.5119%) [+1.15512x]
  Estimated Cycles:                    2540|2138                 (+18.8026%) [+1.18803x]
mem_icount::memcpy::bench misaligned_4:setup(Cfg { len : 4096, s_off : 65, d_off : 66 })
bytes: 4096 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                        9770|7688                 (+27.0812%) [+1.27081x]
  L1 Hits:                            12081|9992                 (+20.9067%) [+1.20907x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               9|6                    (+50.0000%) [+1.50000x]
  Total read+write:                   12090|9998                 (+20.9242%) [+1.20924x]
  Estimated Cycles:                   12396|10202                (+21.5056%) [+1.21506x]
mem_icount::memcpy::bench misaligned_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 66 })
bytes: 1048576 bytes, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                     2359657|1835335              (+28.5682%) [+1.28568x]
  L1 Hits:                          2851307|2326978              (+22.5326%) [+1.22533x]
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                              13|10                   (+30.0000%) [+1.30000x]
  Total read+write:                 2884102|2359770              (+22.2196%) [+1.22220x]
  Estimated Cycles:                 3015672|2491238              (+21.0511%) [+1.21051x]
mem_icount::memset::bench aligned_0:setup(Cfg { len : 16, offset : 0 })
bytes: 16, offset: 0
- end of stdout/stderr
  Instructions:                         288|288                  (No change)
  L1 Hits:                              418|418                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     427|427                  (No change)
  Estimated Cycles:                     673|673                  (No change)
mem_icount::memset::bench aligned_1:setup(Cfg { len : 32, offset : 0 })
bytes: 32, offset: 0
- end of stdout/stderr
  Instructions:                         300|300                  (No change)
  L1 Hits:                              434|434                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     443|443                  (No change)
  Estimated Cycles:                     689|689                  (No change)
mem_icount::memset::bench aligned_2:setup(Cfg { len : 64, offset : 0 })
bytes: 64, offset: 0
- end of stdout/stderr
  Instructions:                         324|324                  (No change)
  L1 Hits:                              466|466                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     475|475                  (No change)
  Estimated Cycles:                     721|721                  (No change)
mem_icount::memset::bench aligned_3:setup(Cfg { len : 512, offset : 0 })
bytes: 512, offset: 0
- end of stdout/stderr
  Instructions:                         660|660                  (No change)
  L1 Hits:                              914|914                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     923|923                  (No change)
  Estimated Cycles:                    1169|1169                 (No change)
mem_icount::memset::bench aligned_4:setup(Cfg { len : 4096, offset : 0 })
bytes: 4096, offset: 0
- end of stdout/stderr
  Instructions:                        3348|3348                 (No change)
  L1 Hits:                             4498|4498                 (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                    4507|4507                 (No change)
  Estimated Cycles:                    4753|4753                 (No change)
mem_icount::memset::bench aligned_5:setup(Cfg { len : MEG1, offset : 0 })
bytes: 1048576, offset: 0
- end of stdout/stderr
  Instructions:                      786614|786614               (No change)
  L1 Hits:                          1032429|1032429              (No change)
  L2 Hits:                            16395|16395                (No change)
  RAM Hits:                              10|10                   (No change)
  Total read+write:                 1048834|1048834              (No change)
  Estimated Cycles:                 1114754|1114754              (No change)
mem_icount::memset::bench offset_0:setup(Cfg { len : 16, offset : 65 })
bytes: 16, offset: 65
- end of stdout/stderr
  Instructions:                         300|300                  (No change)
  L1 Hits:                              433|433                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     442|442                  (No change)
  Estimated Cycles:                     688|688                  (No change)
mem_icount::memset::bench offset_1:setup(Cfg { len : 32, offset : 65 })
bytes: 32, offset: 65
- end of stdout/stderr
  Instructions:                         312|312                  (No change)
  L1 Hits:                              449|449                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     458|458                  (No change)
  Estimated Cycles:                     704|704                  (No change)
mem_icount::memset::bench offset_2:setup(Cfg { len : 64, offset : 65 })
bytes: 64, offset: 65
- end of stdout/stderr
  Instructions:                         336|336                  (No change)
  L1 Hits:                              481|481                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     490|490                  (No change)
  Estimated Cycles:                     736|736                  (No change)
mem_icount::memset::bench offset_3:setup(Cfg { len : 512, offset : 65 })
bytes: 512, offset: 65
- end of stdout/stderr
  Instructions:                         672|672                  (No change)
  L1 Hits:                              929|929                  (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                     938|938                  (No change)
  Estimated Cycles:                    1184|1184                 (No change)
mem_icount::memset::bench offset_4:setup(Cfg { len : 4096, offset : 65 })
bytes: 4096, offset: 65
- end of stdout/stderr
  Instructions:                        3360|3360                 (No change)
  L1 Hits:                             4513|4513                 (No change)
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               7|7                    (No change)
  Total read+write:                    4522|4522                 (No change)
  Estimated Cycles:                    4768|4768                 (No change)
mem_icount::memset::bench offset_5:setup(Cfg { len : MEG1, offset : 65 })
bytes: 1048576, offset: 65
- end of stdout/stderr
  Instructions:                      786626|786626               (No change)
  L1 Hits:                          1032443|1032443              (No change)
  L2 Hits:                            16396|16396                (No change)
  RAM Hits:                              10|10                   (No change)
  Total read+write:                 1048849|1048849              (No change)
  Estimated Cycles:                 1114773|1114773              (No change)
mem_icount::memcmp::bench aligned_0:setup(Cfg { len : 16, s_off : 0, d_off : 0 })
bytes: 16, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         579|579                  (No change)
  L1 Hits:                              850|850                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               4|4                    (No change)
  Total read+write:                     854|854                  (No change)
  Estimated Cycles:                     990|990                  (No change)
mem_icount::memcmp::bench aligned_1:setup(Cfg { len : 32, s_off : 0, d_off : 0 })
bytes: 32, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         675|675                  (No change)
  L1 Hits:                              977|977                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     982|982                  (No change)
  Estimated Cycles:                    1152|1152                 (No change)
mem_icount::memcmp::bench aligned_2:setup(Cfg { len : 64, s_off : 0, d_off : 0 })
bytes: 64, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                         867|867                  (No change)
  L1 Hits:                             1233|1233                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1238|1238                 (No change)
  Estimated Cycles:                    1408|1408                 (No change)
mem_icount::memcmp::bench aligned_3:setup(Cfg { len : 512, s_off : 0, d_off : 0 })
bytes: 512, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                        3555|3555                 (No change)
  L1 Hits:                             4817|4817                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    4822|4822                 (No change)
  Estimated Cycles:                    4992|4992                 (No change)
mem_icount::memcmp::bench aligned_4:setup(Cfg { len : 4096, s_off : 0, d_off : 0 })
bytes: 4096, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                       25059|25059                (No change)
  L1 Hits:                            33489|33489                (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                   33494|33494                (No change)
  Estimated Cycles:                   33664|33664                (No change)
mem_icount::memcmp::bench aligned_5:setup(Cfg { len : MEG1, s_off : 0, d_off : 0 })
bytes: 1048576, src offset: 0, dst offset: 0
- end of stdout/stderr
  Instructions:                     6291746|6291746              (No change)
  L1 Hits:                          8356237|8356237              (No change)
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 8389026|8389026              (No change)
  Estimated Cycles:                 8520452|8520452              (No change)
mem_icount::memcmp::bench offset_0:setup(Cfg { len : 16, s_off : 65, d_off : 65 })
bytes: 16, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         579|579                  (No change)
  L1 Hits:                              849|849                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     854|854                  (No change)
  Estimated Cycles:                    1024|1024                 (No change)
mem_icount::memcmp::bench offset_1:setup(Cfg { len : 32, s_off : 65, d_off : 65 })
bytes: 32, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         675|675                  (No change)
  L1 Hits:                              977|977                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     982|982                  (No change)
  Estimated Cycles:                    1152|1152                 (No change)
mem_icount::memcmp::bench offset_2:setup(Cfg { len : 64, s_off : 65, d_off : 65 })
bytes: 64, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                         867|867                  (No change)
  L1 Hits:                             1233|1233                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1238|1238                 (No change)
  Estimated Cycles:                    1408|1408                 (No change)
mem_icount::memcmp::bench offset_3:setup(Cfg { len : 512, s_off : 65, d_off : 65 })
bytes: 512, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                        3555|3555                 (No change)
  L1 Hits:                             4817|4817                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    4822|4822                 (No change)
  Estimated Cycles:                    4992|4992                 (No change)
mem_icount::memcmp::bench offset_4:setup(Cfg { len : 4096, s_off : 65, d_off : 65 })
bytes: 4096, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                       25059|25059                (No change)
  L1 Hits:                            33489|33489                (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                   33494|33494                (No change)
  Estimated Cycles:                   33664|33664                (No change)
mem_icount::memcmp::bench offset_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 65 })
bytes: 1048576, src offset: 65, dst offset: 65
- end of stdout/stderr
  Instructions:                     6291746|6291746              (No change)
  L1 Hits:                          8356235|8356235              (No change)
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 8389026|8389026              (No change)
  Estimated Cycles:                 8520460|8520460              (No change)
mem_icount::memcmp::bench misaligned_0:setup(Cfg { len : 16, s_off : 65, d_off : 66 })
bytes: 16, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         579|579                  (No change)
  L1 Hits:                              849|849                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     854|854                  (No change)
  Estimated Cycles:                    1024|1024                 (No change)
mem_icount::memcmp::bench misaligned_1:setup(Cfg { len : 32, s_off : 65, d_off : 66 })
bytes: 32, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         675|675                  (No change)
  L1 Hits:                              977|977                  (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                     982|982                  (No change)
  Estimated Cycles:                    1152|1152                 (No change)
mem_icount::memcmp::bench misaligned_2:setup(Cfg { len : 64, s_off : 65, d_off : 66 })
bytes: 64, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                         867|867                  (No change)
  L1 Hits:                             1233|1233                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    1238|1238                 (No change)
  Estimated Cycles:                    1408|1408                 (No change)
mem_icount::memcmp::bench misaligned_3:setup(Cfg { len : 512, s_off : 65, d_off : 66 })
bytes: 512, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                        3555|3555                 (No change)
  L1 Hits:                             4817|4817                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    4822|4822                 (No change)
  Estimated Cycles:                    4992|4992                 (No change)
mem_icount::memcmp::bench misaligned_4:setup(Cfg { len : 4096, s_off : 65, d_off : 66 })
bytes: 4096, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                       25059|25059                (No change)
  L1 Hits:                            33489|33489                (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                   33494|33494                (No change)
  Estimated Cycles:                   33664|33664                (No change)
mem_icount::memcmp::bench misaligned_5:setup(Cfg { len : MEG1, s_off : 65, d_off : 66 })
bytes: 1048576, src offset: 65, dst offset: 66
- end of stdout/stderr
  Instructions:                     6291746|6291746              (No change)
  L1 Hits:                          8356235|8356235              (No change)
  L2 Hits:                            32782|32782                (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                 8389026|8389026              (No change)
  Estimated Cycles:                 8520460|8520460              (No change)
mem_icount::memmove::forward aligned_0:setup_forward(Cfg { len : 4096, spread : Aligned, ...
bytes: 4096, spread: 512, offset: 0, forward
- end of stdout/stderr
  Instructions:                        4391|4388                 (+0.06837%) [+1.00068x]
  L1 Hits:                             6579|6574                 (+0.07606%) [+1.00076x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                               9|9                    (No change)
  Total read+write:                    6590|6585                 (+0.07593%) [+1.00076x]
  Estimated Cycles:                    6904|6899                 (+0.07247%) [+1.00072x]
mem_icount::memmove::forward aligned_1:setup_forward(Cfg { len : MEG1, spread : Aligned, ...
bytes: 1048576, spread: 512, offset: 0, forward
- end of stdout/stderr
  Instructions:                     1048777|1048774              (+0.00029%) [+1.00000x]
  L1 Hits:                          1557243|1557238              (+0.00032%) [+1.00000x]
  L2 Hits:                            15902|15902                (No change)
  RAM Hits:                              12|12                   (No change)
  Total read+write:                 1573157|1573152              (+0.00032%) [+1.00000x]
  Estimated Cycles:                 1637173|1637168              (+0.00031%) [+1.00000x]
mem_icount::memmove::forward small_spread_0:setup_forward(Cfg { len : 16, spread : Small, off ...
bytes: 16, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                         406|340                  (+19.4118%) [+1.19412x]
  L1 Hits:                              578|492                  (+17.4797%) [+1.17480x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     598|504                  (+18.6508%) [+1.18651x]
  Estimated Cycles:                    1218|852                  (+42.9577%) [+1.42958x]
mem_icount::memmove::forward small_spread_1:setup_forward(Cfg { len : 32, spread : Small, off ...
bytes: 32, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                         466|368                  (+26.6304%) [+1.26630x]
  L1 Hits:                              658|528                  (+24.6212%) [+1.24621x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     678|540                  (+25.5556%) [+1.25556x]
  Estimated Cycles:                    1298|888                  (+46.1712%) [+1.46171x]
mem_icount::memmove::forward small_spread_2:setup_forward(Cfg { len : 64, spread : Small, off ...
bytes: 64, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                         586|424                  (+38.2075%) [+1.38208x]
  L1 Hits:                              818|600                  (+36.3333%) [+1.36333x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     838|612                  (+36.9281%) [+1.36928x]
  Estimated Cycles:                    1458|960                  (+51.8750%) [+1.51875x]
mem_icount::memmove::forward small_spread_3:setup_forward(Cfg { len : 512, spread : Small, off...
bytes: 512, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                        2266|1208                 (+87.5828%) [+1.87583x]
  L1 Hits:                             3058|1608                 (+90.1741%) [+1.90174x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                    3078|1620                 (+90.0000%) [+1.90000x]
  Estimated Cycles:                    3698|1968                 (+87.9065%) [+1.87907x]
mem_icount::memmove::forward small_spread_4:setup_forward(Cfg { len : 4096, spread : Small, of...
bytes: 4096, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                       15706|7480                 (+109.973%) [+2.09973x]
  L1 Hits:                            20978|9672                 (+116.894%) [+2.16894x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                   20998|9684                 (+116.832%) [+2.16832x]
  Estimated Cycles:                   21618|10032                (+115.490%) [+2.15490x]
mem_icount::memmove::forward small_spread_5:setup_forward(Cfg { len : MEG1, spread : Small, of...
bytes: 1048576, spread: 1, offset: 0, forward
- end of stdout/stderr
  Instructions:                     3932412|1835226              (+114.274%) [+2.14274x]
  L1 Hits:                          5227337|2343710              (+123.037%) [+2.23037x]
  L2 Hits:                            15887|15888                (-0.00629%) [-1.00006x]
  RAM Hits:                              21|13                   (+61.5385%) [+1.61538x]
  Total read+write:                 5243245|2359611              (+122.208%) [+2.22208x]
  Estimated Cycles:                 5307507|2423605              (+118.992%) [+2.18992x]
mem_icount::memmove::forward medium_spread_0:setup_forward(Cfg { len : 16, spread : Medium, off...
bytes: 16, spread: 9, offset: 0, forward
- end of stdout/stderr
  Instructions:                         406|340                  (+19.4118%) [+1.19412x]
  L1 Hits:                              578|492                  (+17.4797%) [+1.17480x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     598|504                  (+18.6508%) [+1.18651x]
  Estimated Cycles:                    1218|852                  (+42.9577%) [+1.42958x]
mem_icount::memmove::forward medium_spread_1:setup_forward(Cfg { len : 32, spread : Medium, off...
bytes: 32, spread: 17, offset: 0, forward
- end of stdout/stderr
  Instructions:                         466|368                  (+26.6304%) [+1.26630x]
  L1 Hits:                              658|528                  (+24.6212%) [+1.24621x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     678|540                  (+25.5556%) [+1.25556x]
  Estimated Cycles:                    1298|888                  (+46.1712%) [+1.46171x]
mem_icount::memmove::forward medium_spread_2:setup_forward(Cfg { len : 64, spread : Medium, off...
bytes: 64, spread: 33, offset: 0, forward
- end of stdout/stderr
  Instructions:                         586|424                  (+38.2075%) [+1.38208x]
  L1 Hits:                              818|600                  (+36.3333%) [+1.36333x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     838|612                  (+36.9281%) [+1.36928x]
  Estimated Cycles:                    1458|960                  (+51.8750%) [+1.51875x]
mem_icount::memmove::forward medium_spread_3:setup_forward(Cfg { len : 512, spread : Medium, of...
bytes: 512, spread: 257, offset: 0, forward
- end of stdout/stderr
  Instructions:                        2266|1208                 (+87.5828%) [+1.87583x]
  L1 Hits:                             3058|1608                 (+90.1741%) [+1.90174x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                    3078|1620                 (+90.0000%) [+1.90000x]
  Estimated Cycles:                    3698|1968                 (+87.9065%) [+1.87907x]
mem_icount::memmove::forward medium_spread_4:setup_forward(Cfg { len : 4096, spread : Medium, o...
bytes: 4096, spread: 2049, offset: 0, forward
- end of stdout/stderr
  Instructions:                       15706|7480                 (+109.973%) [+2.09973x]
  L1 Hits:                            20978|9672                 (+116.894%) [+2.16894x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                   20998|9684                 (+116.832%) [+2.16832x]
  Estimated Cycles:                   21618|10032                (+115.490%) [+2.15490x]
mem_icount::memmove::forward medium_spread_5:setup_forward(Cfg { len : MEG1, spread : Medium, o...
bytes: 1048576, spread: 524289, offset: 0, forward
- end of stdout/stderr
  Instructions:                     3932412|1835226              (+114.274%) [+2.14274x]
  L1 Hits:                          5210701|2327074              (+123.916%) [+2.23916x]
  L2 Hits:                            32523|32524                (-0.00307%) [-1.00003x]
  RAM Hits:                              21|13                   (+61.5385%) [+1.61538x]
  Total read+write:                 5243245|2359611              (+122.208%) [+2.22208x]
  Estimated Cycles:                 5374051|2490149              (+115.812%) [+2.15812x]
mem_icount::memmove::forward large_spread_0:setup_forward(Cfg { len : 16, spread : Large, off ...
bytes: 16, spread: 15, offset: 0, forward
- end of stdout/stderr
  Instructions:                         404|340                  (+18.8235%) [+1.18824x]
  L1 Hits:                              575|492                  (+16.8699%) [+1.16870x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|10                   (+60.0000%) [+1.60000x]
  Total read+write:                     593|504                  (+17.6587%) [+1.17659x]
  Estimated Cycles:                    1145|852                  (+34.3897%) [+1.34390x]
mem_icount::memmove::forward large_spread_1:setup_forward(Cfg { len : 32, spread : Large, off ...
bytes: 32, spread: 31, offset: 0, forward
- end of stdout/stderr
  Instructions:                         464|368                  (+26.0870%) [+1.26087x]
  L1 Hits:                              655|528                  (+24.0530%) [+1.24053x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|10                   (+60.0000%) [+1.60000x]
  Total read+write:                     673|540                  (+24.6296%) [+1.24630x]
  Estimated Cycles:                    1225|888                  (+37.9505%) [+1.37950x]
mem_icount::memmove::forward large_spread_2:setup_forward(Cfg { len : 64, spread : Large, off ...
bytes: 64, spread: 63, offset: 0, forward
- end of stdout/stderr
  Instructions:                         584|424                  (+37.7358%) [+1.37736x]
  L1 Hits:                              815|600                  (+35.8333%) [+1.35833x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|10                   (+60.0000%) [+1.60000x]
  Total read+write:                     833|612                  (+36.1111%) [+1.36111x]
  Estimated Cycles:                    1385|960                  (+44.2708%) [+1.44271x]
mem_icount::memmove::forward large_spread_3:setup_forward(Cfg { len : 512, spread : Large, off...
bytes: 512, spread: 511, offset: 0, forward
- end of stdout/stderr
  Instructions:                        2264|1208                 (+87.4172%) [+1.87417x]
  L1 Hits:                             3055|1608                 (+89.9876%) [+1.89988x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|10                   (+60.0000%) [+1.60000x]
  Total read+write:                    3073|1620                 (+89.6914%) [+1.89691x]
  Estimated Cycles:                    3625|1968                 (+84.1972%) [+1.84197x]
mem_icount::memmove::forward large_spread_4:setup_forward(Cfg { len : 4096, spread : Large, of...
bytes: 4096, spread: 4095, offset: 0, forward
- end of stdout/stderr
  Instructions:                       15704|7480                 (+109.947%) [+2.09947x]
  L1 Hits:                            20975|9672                 (+116.863%) [+2.16863x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|10                   (+60.0000%) [+1.60000x]
  Total read+write:                   20993|9684                 (+116.780%) [+2.16780x]
  Estimated Cycles:                   21545|10032                (+114.763%) [+2.14763x]
mem_icount::memmove::forward large_spread_5:setup_forward(Cfg { len : MEG1, spread : Large, of...
bytes: 1048576, spread: 1048575, offset: 0, forward
- end of stdout/stderr
  Instructions:                     3932410|1835226              (+114.274%) [+2.14274x]
  L1 Hits:                          5210698|2327074              (+123.916%) [+2.23916x]
  L2 Hits:                            32523|32524                (-0.00307%) [-1.00003x]
  RAM Hits:                              19|13                   (+46.1538%) [+1.46154x]
  Total read+write:                 5243240|2359611              (+122.208%) [+2.22208x]
  Estimated Cycles:                 5373978|2490149              (+115.809%) [+2.15809x]
mem_icount::memmove::forward aligned_off_0:setup_forward(Cfg { len : 4096, spread : Aligned, ...
bytes: 4096, spread: 512, offset: 65, forward
- end of stdout/stderr
  Instructions:                        4407|4404                 (+0.06812%) [+1.00068x]
  L1 Hits:                             6600|6596                 (+0.06064%) [+1.00061x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              10|9                    (+11.1111%) [+1.11111x]
  Total read+write:                    6612|6607                 (+0.07568%) [+1.00076x]
  Estimated Cycles:                    6960|6921                 (+0.56350%) [+1.00564x]
mem_icount::memmove::forward aligned_off_1:setup_forward(Cfg { len : MEG1, spread : Aligned, ...
bytes: 1048576, spread: 512, offset: 65, forward
- end of stdout/stderr
  Instructions:                     1048793|1048790              (+0.00029%) [+1.00000x]
  L1 Hits:                          1557263|1557259              (+0.00026%) [+1.00000x]
  L2 Hits:                            15903|15903                (No change)
  RAM Hits:                              13|12                   (+8.33333%) [+1.08333x]
  Total read+write:                 1573179|1573174              (+0.00032%) [+1.00000x]
  Estimated Cycles:                 1637233|1637194              (+0.00238%) [+1.00002x]
mem_icount::memmove::forward small_spread_off_0:setup_forward(Cfg { len : 16, spread : Small, off ...
bytes: 16, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                         406|340                  (+19.4118%) [+1.19412x]
  L1 Hits:                              578|492                  (+17.4797%) [+1.17480x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     598|504                  (+18.6508%) [+1.18651x]
  Estimated Cycles:                    1218|852                  (+42.9577%) [+1.42958x]
mem_icount::memmove::forward small_spread_off_1:setup_forward(Cfg { len : 32, spread : Small, off ...
bytes: 32, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                         466|368                  (+26.6304%) [+1.26630x]
  L1 Hits:                              658|528                  (+24.6212%) [+1.24621x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     678|540                  (+25.5556%) [+1.25556x]
  Estimated Cycles:                    1298|888                  (+46.1712%) [+1.46171x]
mem_icount::memmove::forward small_spread_off_2:setup_forward(Cfg { len : 64, spread : Small, off ...
bytes: 64, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                         586|424                  (+38.2075%) [+1.38208x]
  L1 Hits:                              818|600                  (+36.3333%) [+1.36333x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     838|612                  (+36.9281%) [+1.36928x]
  Estimated Cycles:                    1458|960                  (+51.8750%) [+1.51875x]
mem_icount::memmove::forward small_spread_off_3:setup_forward(Cfg { len : 512, spread : Small, off...
bytes: 512, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                        2266|1208                 (+87.5828%) [+1.87583x]
  L1 Hits:                             3058|1608                 (+90.1741%) [+1.90174x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                    3078|1620                 (+90.0000%) [+1.90000x]
  Estimated Cycles:                    3698|1968                 (+87.9065%) [+1.87907x]
mem_icount::memmove::forward small_spread_off_4:setup_forward(Cfg { len : 4096, spread : Small, of...
bytes: 4096, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                       15706|7480                 (+109.973%) [+2.09973x]
  L1 Hits:                            20978|9672                 (+116.894%) [+2.16894x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                   20998|9684                 (+116.832%) [+2.16832x]
  Estimated Cycles:                   21618|10032                (+115.490%) [+2.15490x]
mem_icount::memmove::forward small_spread_off_5:setup_forward(Cfg { len : MEG1, spread : Small, of...
bytes: 1048576, spread: 1, offset: 65, forward
- end of stdout/stderr
  Instructions:                     3932412|1835226              (+114.274%) [+2.14274x]
  L1 Hits:                          5227338|2343711              (+123.037%) [+2.23037x]
  L2 Hits:                            15886|15887                (-0.00629%) [-1.00006x]
  RAM Hits:                              21|13                   (+61.5385%) [+1.61538x]
  Total read+write:                 5243245|2359611              (+122.208%) [+2.22208x]
  Estimated Cycles:                 5307503|2423601              (+118.992%) [+2.18992x]
mem_icount::memmove::forward medium_spread_off_0:setup_forward(Cfg { len : 16, spread : Medium, off...
bytes: 16, spread: 9, offset: 65, forward
- end of stdout/stderr
  Instructions:                         406|340                  (+19.4118%) [+1.19412x]
  L1 Hits:                              578|492                  (+17.4797%) [+1.17480x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     598|504                  (+18.6508%) [+1.18651x]
  Estimated Cycles:                    1218|852                  (+42.9577%) [+1.42958x]
mem_icount::memmove::forward medium_spread_off_1:setup_forward(Cfg { len : 32, spread : Medium, off...
bytes: 32, spread: 17, offset: 65, forward
- end of stdout/stderr
  Instructions:                         466|368                  (+26.6304%) [+1.26630x]
  L1 Hits:                              658|528                  (+24.6212%) [+1.24621x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     678|540                  (+25.5556%) [+1.25556x]
  Estimated Cycles:                    1298|888                  (+46.1712%) [+1.46171x]
mem_icount::memmove::forward medium_spread_off_2:setup_forward(Cfg { len : 64, spread : Medium, off...
bytes: 64, spread: 33, offset: 65, forward
- end of stdout/stderr
  Instructions:                         586|424                  (+38.2075%) [+1.38208x]
  L1 Hits:                              818|600                  (+36.3333%) [+1.36333x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                     838|612                  (+36.9281%) [+1.36928x]
  Estimated Cycles:                    1458|960                  (+51.8750%) [+1.51875x]
mem_icount::memmove::forward medium_spread_off_3:setup_forward(Cfg { len : 512, spread : Medium, of...
bytes: 512, spread: 257, offset: 65, forward
- end of stdout/stderr
  Instructions:                        2266|1208                 (+87.5828%) [+1.87583x]
  L1 Hits:                             3058|1608                 (+90.1741%) [+1.90174x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                    3078|1620                 (+90.0000%) [+1.90000x]
  Estimated Cycles:                    3698|1968                 (+87.9065%) [+1.87907x]
mem_icount::memmove::forward medium_spread_off_4:setup_forward(Cfg { len : 4096, spread : Medium, o...
bytes: 4096, spread: 2049, offset: 65, forward
- end of stdout/stderr
  Instructions:                       15706|7480                 (+109.973%) [+2.09973x]
  L1 Hits:                            20978|9672                 (+116.894%) [+2.16894x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              18|10                   (+80.0000%) [+1.80000x]
  Total read+write:                   20998|9684                 (+116.832%) [+2.16832x]
  Estimated Cycles:                   21618|10032                (+115.490%) [+2.15490x]
mem_icount::memmove::forward medium_spread_off_5:setup_forward(Cfg { len : MEG1, spread : Medium, o...
bytes: 1048576, spread: 524289, offset: 65, forward
- end of stdout/stderr
  Instructions:                     3932412|1835226              (+114.274%) [+2.14274x]
  L1 Hits:                          5210700|2327073              (+123.916%) [+2.23916x]
  L2 Hits:                            32524|32525                (-0.00307%) [-1.00003x]
  RAM Hits:                              21|13                   (+61.5385%) [+1.61538x]
  Total read+write:                 5243245|2359611              (+122.208%) [+2.22208x]
  Estimated Cycles:                 5374055|2490153              (+115.812%) [+2.15812x]
mem_icount::memmove::forward large_spread_off_0:setup_forward(Cfg { len : 16, spread : Large, off ...
bytes: 16, spread: 15, offset: 65, forward
- end of stdout/stderr
  Instructions:                         399|327                  (+22.0183%) [+1.22018x]
  L1 Hits:                              568|473                  (+20.0846%) [+1.20085x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|10                   (+50.0000%) [+1.50000x]
  Total read+write:                     585|485                  (+20.6186%) [+1.20619x]
  Estimated Cycles:                    1103|833                  (+32.4130%) [+1.32413x]
mem_icount::memmove::forward large_spread_off_1:setup_forward(Cfg { len : 32, spread : Large, off ...
bytes: 32, spread: 31, offset: 65, forward
- end of stdout/stderr
  Instructions:                         459|355                  (+29.2958%) [+1.29296x]
  L1 Hits:                              648|509                  (+27.3084%) [+1.27308x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|10                   (+50.0000%) [+1.50000x]
  Total read+write:                     665|521                  (+27.6392%) [+1.27639x]
  Estimated Cycles:                    1183|869                  (+36.1335%) [+1.36133x]
mem_icount::memmove::forward large_spread_off_2:setup_forward(Cfg { len : 64, spread : Large, off ...
bytes: 64, spread: 63, offset: 65, forward
- end of stdout/stderr
  Instructions:                         579|411                  (+40.8759%) [+1.40876x]
  L1 Hits:                              808|581                  (+39.0706%) [+1.39071x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|10                   (+50.0000%) [+1.50000x]
  Total read+write:                     825|593                  (+39.1231%) [+1.39123x]
  Estimated Cycles:                    1343|941                  (+42.7205%) [+1.42721x]
mem_icount::memmove::forward large_spread_off_3:setup_forward(Cfg { len : 512, spread : Large, off...
bytes: 512, spread: 511, offset: 65, forward
- end of stdout/stderr
  Instructions:                        2259|1195                 (+89.0377%) [+1.89038x]
  L1 Hits:                             3048|1589                 (+91.8188%) [+1.91819x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|10                   (+50.0000%) [+1.50000x]
  Total read+write:                    3065|1601                 (+91.4428%) [+1.91443x]
  Estimated Cycles:                    3583|1949                 (+83.8379%) [+1.83838x]
mem_icount::memmove::forward large_spread_off_4:setup_forward(Cfg { len : 4096, spread : Large, of...
bytes: 4096, spread: 4095, offset: 65, forward
- end of stdout/stderr
  Instructions:                       15699|7467                 (+110.245%) [+2.10245x]
  L1 Hits:                            20968|9653                 (+117.217%) [+2.17217x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              15|10                   (+50.0000%) [+1.50000x]
  Total read+write:                   20985|9665                 (+117.124%) [+2.17124x]
  Estimated Cycles:                   21503|10013                (+114.751%) [+2.14751x]
mem_icount::memmove::forward large_spread_off_5:setup_forward(Cfg { len : MEG1, spread : Large, of...
bytes: 1048576, spread: 1048575, offset: 65, forward
- end of stdout/stderr
  Instructions:                     3932405|1835213              (+114.275%) [+2.14275x]
  L1 Hits:                          5210692|2327056              (+123.918%) [+2.23918x]
  L2 Hits:                            32522|32523                (-0.00307%) [-1.00003x]
  RAM Hits:                              18|13                   (+38.4615%) [+1.38462x]
  Total read+write:                 5243232|2359592              (+122.209%) [+2.22209x]
  Estimated Cycles:                 5373932|2490126              (+115.810%) [+2.15810x]
mem_icount::memmove::backward aligned_0:setup_backward(Cfg { len : 4096, spread : Aligned,...
bytes: 4096, spread: 512, offset: 0, backward
- end of stdout/stderr
  Instructions:                        4388|4386                 (+0.04560%) [+1.00046x]
  L1 Hits:                             6579|6576                 (+0.04562%) [+1.00046x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              10|9                    (+11.1111%) [+1.11111x]
  Total read+write:                    6591|6587                 (+0.06073%) [+1.00061x]
  Estimated Cycles:                    6939|6901                 (+0.55064%) [+1.00551x]
mem_icount::memmove::backward aligned_1:setup_backward(Cfg { len : MEG1, spread : Aligned,...
bytes: 1048576, spread: 512, offset: 0, backward
- end of stdout/stderr
  Instructions:                     1048774|1048772              (+0.00019%) [+1.00000x]
  L1 Hits:                          1556742|1556739              (+0.00019%) [+1.00000x]
  L2 Hits:                            16403|16403                (No change)
  RAM Hits:                              13|12                   (+8.33333%) [+1.08333x]
  Total read+write:                 1573158|1573154              (+0.00025%) [+1.00000x]
  Estimated Cycles:                 1639212|1639174              (+0.00232%) [+1.00002x]
mem_icount::memmove::backward small_spread_0:setup_backward(Cfg { len : 16, spread : Small, off...
bytes: 16, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                         370|326                  (+13.4969%) [+1.13497x]
  L1 Hits:                              524|475                  (+10.3158%) [+1.10316x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     542|488                  (+11.0656%) [+1.11066x]
  Estimated Cycles:                    1094|870                  (+25.7471%) [+1.25747x]
mem_icount::memmove::backward small_spread_1:setup_backward(Cfg { len : 32, spread : Small, off...
bytes: 32, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                         406|354                  (+14.6893%) [+1.14689x]
  L1 Hits:                              568|511                  (+11.1546%) [+1.11155x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     586|524                  (+11.8321%) [+1.11832x]
  Estimated Cycles:                    1138|906                  (+25.6071%) [+1.25607x]
mem_icount::memmove::backward small_spread_2:setup_backward(Cfg { len : 64, spread : Small, off...
bytes: 64, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                         478|410                  (+16.5854%) [+1.16585x]
  L1 Hits:                              656|583                  (+12.5214%) [+1.12521x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     674|596                  (+13.0872%) [+1.13087x]
  Estimated Cycles:                    1226|978                  (+25.3579%) [+1.25358x]
mem_icount::memmove::backward small_spread_3:setup_backward(Cfg { len : 512, spread : Small, of...
bytes: 512, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                        1486|1194                 (+24.4556%) [+1.24456x]
  L1 Hits:                             1888|1591                 (+18.6675%) [+1.18668x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                    1906|1604                 (+18.8279%) [+1.18828x]
  Estimated Cycles:                    2458|1986                 (+23.7664%) [+1.23766x]
mem_icount::memmove::backward small_spread_4:setup_backward(Cfg { len : 4096, spread : Small, o...
bytes: 4096, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                        9550|7466                 (+27.9132%) [+1.27913x]
  L1 Hits:                            11744|9655                 (+21.6365%) [+1.21636x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                   11762|9668                 (+21.6591%) [+1.21659x]
  Estimated Cycles:                   12314|10050                (+22.5274%) [+1.22527x]
mem_icount::memmove::backward small_spread_5:setup_backward(Cfg { len : MEG1, spread : Small, o...
bytes: 1048576, spread: 1, offset: 0, backward
- end of stdout/stderr
  Instructions:                     2359536|1835212              (+28.5702%) [+1.28570x]
  L1 Hits:                          2867514|2343185              (+22.3768%) [+1.22377x]
  L2 Hits:                            16396|16396                (No change)
  RAM Hits:                              19|14                   (+35.7143%) [+1.35714x]
  Total read+write:                 2883929|2359595              (+22.2214%) [+1.22221x]
  Estimated Cycles:                 2950159|2425655              (+21.6232%) [+1.21623x]
mem_icount::memmove::backward medium_spread_0:setup_backward(Cfg { len : 16, spread : Medium, of...
bytes: 16, spread: 9, offset: 0, backward
- end of stdout/stderr
  Instructions:                         370|326                  (+13.4969%) [+1.13497x]
  L1 Hits:                              524|475                  (+10.3158%) [+1.10316x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     542|488                  (+11.0656%) [+1.11066x]
  Estimated Cycles:                    1094|870                  (+25.7471%) [+1.25747x]
mem_icount::memmove::backward medium_spread_1:setup_backward(Cfg { len : 32, spread : Medium, of...
bytes: 32, spread: 17, offset: 0, backward
- end of stdout/stderr
  Instructions:                         406|354                  (+14.6893%) [+1.14689x]
  L1 Hits:                              568|511                  (+11.1546%) [+1.11155x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     586|524                  (+11.8321%) [+1.11832x]
  Estimated Cycles:                    1138|906                  (+25.6071%) [+1.25607x]
mem_icount::memmove::backward medium_spread_2:setup_backward(Cfg { len : 64, spread : Medium, of...
bytes: 64, spread: 33, offset: 0, backward
- end of stdout/stderr
  Instructions:                         478|410                  (+16.5854%) [+1.16585x]
  L1 Hits:                              656|583                  (+12.5214%) [+1.12521x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     674|596                  (+13.0872%) [+1.13087x]
  Estimated Cycles:                    1226|978                  (+25.3579%) [+1.25358x]
mem_icount::memmove::backward medium_spread_3:setup_backward(Cfg { len : 512, spread : Medium, o...
bytes: 512, spread: 257, offset: 0, backward
- end of stdout/stderr
  Instructions:                        1486|1194                 (+24.4556%) [+1.24456x]
  L1 Hits:                             1888|1591                 (+18.6675%) [+1.18668x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                    1906|1604                 (+18.8279%) [+1.18828x]
  Estimated Cycles:                    2458|1986                 (+23.7664%) [+1.23766x]
mem_icount::memmove::backward medium_spread_4:setup_backward(Cfg { len : 4096, spread : Medium, ...
bytes: 4096, spread: 2049, offset: 0, backward
- end of stdout/stderr
  Instructions:                        9550|7466                 (+27.9132%) [+1.27913x]
  L1 Hits:                            11744|9655                 (+21.6365%) [+1.21636x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                   11762|9668                 (+21.6591%) [+1.21659x]
  Estimated Cycles:                   12314|10050                (+22.5274%) [+1.22527x]
mem_icount::memmove::backward medium_spread_5:setup_backward(Cfg { len : MEG1, spread : Medium, ...
bytes: 1048576, spread: 524289, offset: 0, backward
- end of stdout/stderr
  Instructions:                     2359536|1835212              (+28.5702%) [+1.28570x]
  L1 Hits:                          2851130|2326801              (+22.5343%) [+1.22534x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                              19|14                   (+35.7143%) [+1.35714x]
  Total read+write:                 2883929|2359595              (+22.2214%) [+1.22221x]
  Estimated Cycles:                 3015695|2491191              (+21.0543%) [+1.21054x]
mem_icount::memmove::backward large_spread_0:setup_backward(Cfg { len : 16, spread : Large, off...
bytes: 16, spread: 15, offset: 0, backward
- end of stdout/stderr
  Instructions:                         369|326                  (+13.1902%) [+1.13190x]
  L1 Hits:                              526|475                  (+10.7368%) [+1.10737x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     544|488                  (+11.4754%) [+1.11475x]
  Estimated Cycles:                    1096|870                  (+25.9770%) [+1.25977x]
mem_icount::memmove::backward large_spread_1:setup_backward(Cfg { len : 32, spread : Large, off...
bytes: 32, spread: 31, offset: 0, backward
- end of stdout/stderr
  Instructions:                         405|354                  (+14.4068%) [+1.14407x]
  L1 Hits:                              570|511                  (+11.5460%) [+1.11546x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     588|524                  (+12.2137%) [+1.12214x]
  Estimated Cycles:                    1140|906                  (+25.8278%) [+1.25828x]
mem_icount::memmove::backward large_spread_2:setup_backward(Cfg { len : 64, spread : Large, off...
bytes: 64, spread: 63, offset: 0, backward
- end of stdout/stderr
  Instructions:                         477|410                  (+16.3415%) [+1.16341x]
  L1 Hits:                              658|583                  (+12.8645%) [+1.12864x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     676|596                  (+13.4228%) [+1.13423x]
  Estimated Cycles:                    1228|978                  (+25.5624%) [+1.25562x]
mem_icount::memmove::backward large_spread_3:setup_backward(Cfg { len : 512, spread : Large, of...
bytes: 512, spread: 511, offset: 0, backward
- end of stdout/stderr
  Instructions:                        1485|1194                 (+24.3719%) [+1.24372x]
  L1 Hits:                             1890|1591                 (+18.7932%) [+1.18793x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                    1908|1604                 (+18.9526%) [+1.18953x]
  Estimated Cycles:                    2460|1986                 (+23.8671%) [+1.23867x]
mem_icount::memmove::backward large_spread_4:setup_backward(Cfg { len : 4096, spread : Large, o...
bytes: 4096, spread: 4095, offset: 0, backward
- end of stdout/stderr
  Instructions:                        9549|7466                 (+27.8998%) [+1.27900x]
  L1 Hits:                            11746|9655                 (+21.6572%) [+1.21657x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                   11764|9668                 (+21.6798%) [+1.21680x]
  Estimated Cycles:                   12316|10050                (+22.5473%) [+1.22547x]
mem_icount::memmove::backward large_spread_5:setup_backward(Cfg { len : MEG1, spread : Large, o...
bytes: 1048576, spread: 1048575, offset: 0, backward
- end of stdout/stderr
  Instructions:                     2359535|1835212              (+28.5702%) [+1.28570x]
  L1 Hits:                          2851132|2326801              (+22.5344%) [+1.22534x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                              19|14                   (+35.7143%) [+1.35714x]
  Total read+write:                 2883931|2359595              (+22.2214%) [+1.22221x]
  Estimated Cycles:                 3015697|2491191              (+21.0544%) [+1.21054x]
mem_icount::memmove::backward aligned_off_0:setup_backward(Cfg { len : 4096, spread : Aligned,...
bytes: 4096, spread: 512, offset: 65, backward
- end of stdout/stderr
  Instructions:                        4402|4400                 (+0.04545%) [+1.00045x]
  L1 Hits:                             6599|6596                 (+0.04548%) [+1.00045x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              10|9                    (+11.1111%) [+1.11111x]
  Total read+write:                    6611|6607                 (+0.06054%) [+1.00061x]
  Estimated Cycles:                    6959|6921                 (+0.54905%) [+1.00549x]
mem_icount::memmove::backward aligned_off_1:setup_backward(Cfg { len : MEG1, spread : Aligned,...
bytes: 1048576, spread: 512, offset: 65, backward
- end of stdout/stderr
  Instructions:                     1048788|1048786              (+0.00019%) [+1.00000x]
  L1 Hits:                          1556761|1556758              (+0.00019%) [+1.00000x]
  L2 Hits:                            16404|16404                (No change)
  RAM Hits:                              13|12                   (+8.33333%) [+1.08333x]
  Total read+write:                 1573178|1573174              (+0.00025%) [+1.00000x]
  Estimated Cycles:                 1639236|1639198              (+0.00232%) [+1.00002x]
mem_icount::memmove::backward small_spread_off_0:setup_backward(Cfg { len : 16, spread : Small, off...
bytes: 16, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                         379|337                  (+12.4629%) [+1.12463x]
  L1 Hits:                              539|492                  (+9.55285%) [+1.09553x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     557|505                  (+10.2970%) [+1.10297x]
  Estimated Cycles:                    1109|887                  (+25.0282%) [+1.25028x]
mem_icount::memmove::backward small_spread_off_1:setup_backward(Cfg { len : 32, spread : Small, off...
bytes: 32, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                         415|365                  (+13.6986%) [+1.13699x]
  L1 Hits:                              583|528                  (+10.4167%) [+1.10417x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     601|541                  (+11.0906%) [+1.11091x]
  Estimated Cycles:                    1153|923                  (+24.9187%) [+1.24919x]
mem_icount::memmove::backward small_spread_off_2:setup_backward(Cfg { len : 64, spread : Small, off...
bytes: 64, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                         487|421                  (+15.6770%) [+1.15677x]
  L1 Hits:                              671|600                  (+11.8333%) [+1.11833x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     689|613                  (+12.3980%) [+1.12398x]
  Estimated Cycles:                    1241|995                  (+24.7236%) [+1.24724x]
mem_icount::memmove::backward small_spread_off_3:setup_backward(Cfg { len : 512, spread : Small, of...
bytes: 512, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                        1495|1205                 (+24.0664%) [+1.24066x]
  L1 Hits:                             1903|1608                 (+18.3458%) [+1.18346x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                    1921|1621                 (+18.5071%) [+1.18507x]
  Estimated Cycles:                    2473|2003                 (+23.4648%) [+1.23465x]
mem_icount::memmove::backward small_spread_off_4:setup_backward(Cfg { len : 4096, spread : Small, o...
bytes: 4096, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                        9559|7477                 (+27.8454%) [+1.27845x]
  L1 Hits:                            11759|9672                 (+21.5778%) [+1.21578x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                   11777|9685                 (+21.6004%) [+1.21600x]
  Estimated Cycles:                   12329|10067                (+22.4695%) [+1.22469x]
mem_icount::memmove::backward small_spread_off_5:setup_backward(Cfg { len : MEG1, spread : Small, o...
bytes: 1048576, spread: 1, offset: 65, backward
- end of stdout/stderr
  Instructions:                     2359545|1835223              (+28.5699%) [+1.28570x]
  L1 Hits:                          2867529|2343202              (+22.3765%) [+1.22377x]
  L2 Hits:                            16396|16396                (No change)
  RAM Hits:                              19|14                   (+35.7143%) [+1.35714x]
  Total read+write:                 2883944|2359612              (+22.2211%) [+1.22221x]
  Estimated Cycles:                 2950174|2425672              (+21.6230%) [+1.21623x]
mem_icount::memmove::backward medium_spread_off_0:setup_backward(Cfg { len : 16, spread : Medium, of...
bytes: 16, spread: 9, offset: 65, backward
- end of stdout/stderr
  Instructions:                         379|337                  (+12.4629%) [+1.12463x]
  L1 Hits:                              539|492                  (+9.55285%) [+1.09553x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     557|505                  (+10.2970%) [+1.10297x]
  Estimated Cycles:                    1109|887                  (+25.0282%) [+1.25028x]
mem_icount::memmove::backward medium_spread_off_1:setup_backward(Cfg { len : 32, spread : Medium, of...
bytes: 32, spread: 17, offset: 65, backward
- end of stdout/stderr
  Instructions:                         415|365                  (+13.6986%) [+1.13699x]
  L1 Hits:                              583|528                  (+10.4167%) [+1.10417x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     601|541                  (+11.0906%) [+1.11091x]
  Estimated Cycles:                    1153|923                  (+24.9187%) [+1.24919x]
mem_icount::memmove::backward medium_spread_off_2:setup_backward(Cfg { len : 64, spread : Medium, of...
bytes: 64, spread: 33, offset: 65, backward
- end of stdout/stderr
  Instructions:                         487|421                  (+15.6770%) [+1.15677x]
  L1 Hits:                              671|600                  (+11.8333%) [+1.11833x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     689|613                  (+12.3980%) [+1.12398x]
  Estimated Cycles:                    1241|995                  (+24.7236%) [+1.24724x]
mem_icount::memmove::backward medium_spread_off_3:setup_backward(Cfg { len : 512, spread : Medium, o...
bytes: 512, spread: 257, offset: 65, backward
- end of stdout/stderr
  Instructions:                        1495|1205                 (+24.0664%) [+1.24066x]
  L1 Hits:                             1903|1608                 (+18.3458%) [+1.18346x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                    1921|1621                 (+18.5071%) [+1.18507x]
  Estimated Cycles:                    2473|2003                 (+23.4648%) [+1.23465x]
mem_icount::memmove::backward medium_spread_off_4:setup_backward(Cfg { len : 4096, spread : Medium, ...
bytes: 4096, spread: 2049, offset: 65, backward
- end of stdout/stderr
  Instructions:                        9559|7477                 (+27.8454%) [+1.27845x]
  L1 Hits:                            11759|9672                 (+21.5778%) [+1.21578x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                   11777|9685                 (+21.6004%) [+1.21600x]
  Estimated Cycles:                   12329|10067                (+22.4695%) [+1.22469x]
mem_icount::memmove::backward medium_spread_off_5:setup_backward(Cfg { len : MEG1, spread : Medium, ...
bytes: 1048576, spread: 524289, offset: 65, backward
- end of stdout/stderr
  Instructions:                     2359545|1835223              (+28.5699%) [+1.28570x]
  L1 Hits:                          2851144|2326817              (+22.5341%) [+1.22534x]
  L2 Hits:                            32781|32781                (No change)
  RAM Hits:                              19|14                   (+35.7143%) [+1.35714x]
  Total read+write:                 2883944|2359612              (+22.2211%) [+1.22221x]
  Estimated Cycles:                 3015714|2491212              (+21.0541%) [+1.21054x]
mem_icount::memmove::backward large_spread_off_0:setup_backward(Cfg { len : 16, spread : Large, off...
bytes: 16, spread: 15, offset: 65, backward
- end of stdout/stderr
  Instructions:                         378|337                  (+12.1662%) [+1.12166x]
  L1 Hits:                              541|492                  (+9.95935%) [+1.09959x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     559|505                  (+10.6931%) [+1.10693x]
  Estimated Cycles:                    1111|887                  (+25.2537%) [+1.25254x]
mem_icount::memmove::backward large_spread_off_1:setup_backward(Cfg { len : 32, spread : Large, off...
bytes: 32, spread: 31, offset: 65, backward
- end of stdout/stderr
  Instructions:                         414|365                  (+13.4247%) [+1.13425x]
  L1 Hits:                              585|528                  (+10.7955%) [+1.10795x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     603|541                  (+11.4603%) [+1.11460x]
  Estimated Cycles:                    1155|923                  (+25.1354%) [+1.25135x]
mem_icount::memmove::backward large_spread_off_2:setup_backward(Cfg { len : 64, spread : Large, off...
bytes: 64, spread: 63, offset: 65, backward
- end of stdout/stderr
  Instructions:                         486|421                  (+15.4394%) [+1.15439x]
  L1 Hits:                              673|600                  (+12.1667%) [+1.12167x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                     691|613                  (+12.7243%) [+1.12724x]
  Estimated Cycles:                    1243|995                  (+24.9246%) [+1.24925x]
mem_icount::memmove::backward large_spread_off_3:setup_backward(Cfg { len : 512, spread : Large, of...
bytes: 512, spread: 511, offset: 65, backward
- end of stdout/stderr
  Instructions:                        1494|1205                 (+23.9834%) [+1.23983x]
  L1 Hits:                             1905|1608                 (+18.4701%) [+1.18470x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                    1923|1621                 (+18.6305%) [+1.18630x]
  Estimated Cycles:                    2475|2003                 (+23.5647%) [+1.23565x]
mem_icount::memmove::backward large_spread_off_4:setup_backward(Cfg { len : 4096, spread : Large, o...
bytes: 4096, spread: 4095, offset: 65, backward
- end of stdout/stderr
  Instructions:                        9558|7477                 (+27.8320%) [+1.27832x]
  L1 Hits:                            11761|9672                 (+21.5984%) [+1.21598x]
  L2 Hits:                                2|2                    (No change)
  RAM Hits:                              16|11                   (+45.4545%) [+1.45455x]
  Total read+write:                   11779|9685                 (+21.6211%) [+1.21621x]
  Estimated Cycles:                   12331|10067                (+22.4893%) [+1.22489x]
mem_icount::memmove::backward large_spread_off_5:setup_backward(Cfg { len : MEG1, spread : Large, o...
bytes: 1048576, spread: 1048575, offset: 65, backward
- end of stdout/stderr
  Instructions:                     2359544|1835223              (+28.5699%) [+1.28570x]
  L1 Hits:                          2851147|2326818              (+22.5342%) [+1.22534x]
  L2 Hits:                            32780|32780                (No change)
  RAM Hits:                              19|14                   (+35.7143%) [+1.35714x]
  Total read+write:                 2883946|2359612              (+22.2212%) [+1.22221x]
  Estimated Cycles:                 3015712|2491208              (+21.0542%) [+1.21054x]

RalfJung · 2025-03-22T06:39:46Z

memcpy has about a 25% slowdown for the alignment mismatch tests. memmove has about the same slowdown but it slows down more significantly for larger copies. Worse case results:

That is strange, for larger tests the slowdown should trail off to 0, no? The core "hot" loop is exactly the same as before.

Should I submit a PR that copies the prefix/postfix bytewise instead of via a 1-byte-load and a 2-byte-load, just so that we can get those numbers for comparison?

tgross35 · 2025-03-22T07:13:28Z

Maybe they were tapering at different rates so the delta appears larger at larger copy lengths? Though 4k and 1M are kind of large to be seeing that behavior.

I don't think we really need to do anything here but if you have something in mind, I'm happy to test it.

Includes the following changes: * Avoid OOB access in `memcpy` and `memmove` [1] * Enable intrinsics on AVR [2] * `libm` updates to avoid using `core::arch` vector intrinsics [3] [1]: rust-lang/compiler-builtins#799 [2]: rust-lang/compiler-builtins#791 [3]: rust-lang/compiler-builtins#814

Update `compiler-builtins` to 0.1.153 Includes the following changes: * Avoid OOB access in `memcpy` and `memmove` [1] * Enable intrinsics on AVR [2] * `libm` updates to avoid using `core::arch` vector intrinsics [3] [1]: rust-lang/compiler-builtins#799 [2]: rust-lang/compiler-builtins#791 [3]: rust-lang/compiler-builtins#814

Includes the following changes: * Avoid OOB access in `memcpy` and `memmove` [1] * Enable intrinsics on AVR [2] * `libm` updates to avoid using `core::arch` vector intrinsics [3] * Re-enable `f16` on aarch64 without Neon [4] [1]: rust-lang/compiler-builtins#799 [2]: rust-lang/compiler-builtins#791 [3]: rust-lang/compiler-builtins#814 [4]: rust-lang/compiler-builtins#809

Update `compiler-builtins` to 0.1.153 Includes the following changes: * Avoid OOB access in `memcpy` and `memmove` [1] * Enable intrinsics on AVR [2] * `libm` updates to avoid using `core::arch` vector intrinsics [3] * Re-enable `f16` on aarch64 without Neon [4] [1]: rust-lang/compiler-builtins#799 [2]: rust-lang/compiler-builtins#791 [3]: rust-lang/compiler-builtins#814 [4]: rust-lang/compiler-builtins#809

Rollup merge of rust-lang#139600 - tgross35:update-builtins, r=tgross35 Update `compiler-builtins` to 0.1.153 Includes the following changes: * Avoid OOB access in `memcpy` and `memmove` [1] * Enable intrinsics on AVR [2] * `libm` updates to avoid using `core::arch` vector intrinsics [3] * Re-enable `f16` on aarch64 without Neon [4] [1]: rust-lang/compiler-builtins#799 [2]: rust-lang/compiler-builtins#791 [3]: rust-lang/compiler-builtins#814 [4]: rust-lang/compiler-builtins#809

Update `compiler-builtins` to 0.1.153 Includes the following changes: * Avoid OOB access in `memcpy` and `memmove` [1] * Enable intrinsics on AVR [2] * `libm` updates to avoid using `core::arch` vector intrinsics [3] * Re-enable `f16` on aarch64 without Neon [4] [1]: rust-lang/compiler-builtins#799 [2]: rust-lang/compiler-builtins#791 [3]: rust-lang/compiler-builtins#814 [4]: rust-lang/compiler-builtins#809

Includes the following changes: * Avoid OOB access in `memcpy` and `memmove` [1] * Enable intrinsics on AVR [2] * `libm` updates to avoid using `core::arch` vector intrinsics [3] * Re-enable `f16` on aarch64 without Neon [4] [1]: rust-lang/compiler-builtins#799 [2]: rust-lang/compiler-builtins#791 [3]: rust-lang/compiler-builtins#814 [4]: rust-lang/compiler-builtins#809

Update `compiler-builtins` to 0.1.153 Includes the following changes: * Avoid OOB access in `memcpy` and `memmove` [1] * Enable intrinsics on AVR [2] * `libm` updates to avoid using `core::arch` vector intrinsics [3] * Re-enable `f16` on aarch64 without Neon [4] [1]: rust-lang/compiler-builtins#799 [2]: rust-lang/compiler-builtins#791 [3]: rust-lang/compiler-builtins#814 [4]: rust-lang/compiler-builtins#809

RalfJung mentioned this pull request Mar 18, 2025

Several functions perform out-of-bounds memory accesses (which is UB) #559

Closed

RalfJung force-pushed the memmove-inbounds branch from ea4768d to 469b91f Compare March 18, 2025 20:26

RalfJung force-pushed the memmove-inbounds branch 6 times, most recently from e4ef842 to 842fa94 Compare March 18, 2025 21:51

RalfJung commented Mar 18, 2025

View reviewed changes

RalfJung force-pushed the memmove-inbounds branch 2 times, most recently from 605b6ca to d8cf8a2 Compare March 18, 2025 22:17

RalfJung force-pushed the memmove-inbounds branch 2 times, most recently from cd0efd6 to 01c90a6 Compare March 18, 2025 22:44

tgross35 reviewed Mar 18, 2025

View reviewed changes

.github/workflows/main.yml Show resolved Hide resolved

ci/miri.sh Outdated Show resolved Hide resolved

src/mem/impls.rs Outdated Show resolved Hide resolved

src/mem/impls.rs Outdated Show resolved Hide resolved

RalfJung force-pushed the memmove-inbounds branch 5 times, most recently from b3464f6 to 9330e7c Compare March 19, 2025 08:41

RalfJung force-pushed the memmove-inbounds branch from 9330e7c to 277aecd Compare March 21, 2025 21:20

RalfJung added 5 commits March 22, 2025 00:31

copy_misaligned_words: avoid out-of-bounds accesses

806c2da

add test to make Miri able to detect OOB in memmove

5da59e5

run Miri on CI

bec25a4

turn load_prefix macro into helper function

e751c6e

replace #[cfg] by cfg!

b603726

tgross35 force-pushed the memmove-inbounds branch from 277aecd to b603726 Compare March 22, 2025 05:31

tgross35 approved these changes Mar 22, 2025

View reviewed changes

tgross35 enabled auto-merge (squash) March 22, 2025 05:32

tgross35 merged commit 4df7a8d into rust-lang:master Mar 22, 2025
27 checks passed

github-actions bot mentioned this pull request Mar 22, 2025

chore: release v0.1.153 #807

Merged

RalfJung mentioned this pull request Mar 22, 2025

copy_misaligned_words: use inline asm on ARM, simplify fallback implementation #808

Open

tgross35 mentioned this pull request Apr 9, 2025

Update compiler-builtins to 0.1.153 rust-lang/rust#139600

Merged

copy_misaligned_words: avoid out-of-bounds accesses #799

copy_misaligned_words: avoid out-of-bounds accesses #799

Uh oh!

Conversation

RalfJung commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tgross35 commented Mar 18, 2025

Uh oh!

RalfJung commented Mar 18, 2025

Uh oh!

RalfJung Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

RalfJung commented Mar 18, 2025

Uh oh!

tgross35 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tgross35 commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RalfJung commented Mar 19, 2025

Uh oh!

nbdd0121 commented Mar 19, 2025

Uh oh!

RalfJung commented Mar 19, 2025

Uh oh!

tgross35 commented Mar 21, 2025

Uh oh!

RalfJung commented Mar 21, 2025

Uh oh!

tgross35 commented Mar 22, 2025

Uh oh!

Uh oh!

RalfJung commented Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tgross35 commented Mar 22, 2025

Uh oh!

Uh oh!

RalfJung commented Mar 18, 2025 •

edited

Loading

tgross35 commented Mar 19, 2025 •

edited

Loading

RalfJung commented Mar 22, 2025 •

edited

Loading