Conversation
memcpy (iteration guard)memcpy
3f5b5d0 to
30b783a
Compare
memcpymemcpy
bitwalker
left a comment
There was a problem hiding this comment.
I think this mostly looks fine, I left comments for my various nits, but nothing major sticks out.
codegen/masm/intrinsics/mem.masm
Outdated
| # which is the offset of the first byte, in the 32-bit representation of that element. | ||
| # | ||
| # Stack transition: [addr, offset] -> [value] | ||
| pub proc load_u16 |
There was a problem hiding this comment.
| pub proc load_u16 | |
| pub proc load_u16(addr: ptr<felt, addrspace(felt)>, offset: u8) -> u16 |
Let's make sure to add type signatures to all new procedures
codegen/masm/src/emit/mem.rs
Outdated
| impl OpEmitter<'_> { | ||
| /// Emit the loop header for a counted `while.true` loop. | ||
| /// | ||
| /// The caller provides the `dup` instruction needed to bring `count` to the top of the stack |
There was a problem hiding this comment.
I think it would help to clarify a bit why this is up to the caller, and perhaps provide an example. IMO, it would make more sense to provide an offset value or something, rather than the instruction itself - but the rationale might be obvious from better documentation/examples here.
codegen/masm/src/emit/mem.rs
Outdated
|
|
||
| /// Emit the loop back-edge condition for a counted `while.true` loop. | ||
| /// | ||
| /// The caller provides the `dup` instruction needed to bring `count` to the top of the stack |
codegen/masm/src/emit/mem.rs
Outdated
| self.emit_all( | ||
| [ | ||
| masm::Instruction::U32DivModImm(16.into()), | ||
| masm::Instruction::Assertz, |
There was a problem hiding this comment.
We should probably consider using an instruction variant with an error message, so it is clear why this fails when it fails
codegen/masm/src/emit/mem.rs
Outdated
| /// | ||
| /// This is used for branch bodies which operate on a known stack shape from the enclosing | ||
| /// emitter, but which do not need to synchronize typed operand-stack state back to it. | ||
| fn build_raw_block( |
There was a problem hiding this comment.
Is there a distinction between a "raw" MASM block and a non-"raw" MASM block? I think I might just call this build_masm_block or something
codegen/masm/src/emit/mem.rs
Outdated
| [ | ||
| // Convert `src` to element address | ||
| masm::Instruction::U32DivModImm(4.into()), | ||
| masm::Instruction::Assertz, |
There was a problem hiding this comment.
We should probably try and make these asserts produce useful errors
There was a problem hiding this comment.
I also went and added error messages to all emitted asserts at 16a7669
|
Please don't merge this PR before #995. It's a pain to rebase it afterwards. |
|
This just needs a rebase and it can be merged |
Fix the byte-addressed memory paths that cross a 32-bit element boundary. This keeps the `memcpy`/`memset` fallback coverage added in this branch working for short unaligned copies, including scalarized `u16` loads and stores at byte offset 3.
Zero-length memory operations must be no-ops, but both loop headers seeded `while.true` with `count >= 0`, which executes one iteration when `count == 0`. Switch the entry condition to a strict unsigned `count > 0` check and add regressions for zero-count unaligned copy/set paths.
The unaligned `u16` regressions are asserting compiler memory layout, so they should not depend on the host endianness. Use `to_le_bytes()` in the expected byte construction to keep the tests portable and aligned with the byte-addressable memory model.
`memset` and fallback `memcpy` were carrying separate copies of the same counted `while.true` control flow, which makes fixes easy to miss in one path. Extract the shared loop header and back-edge emission so the counted loop protocol is defined once and reused by both sites.
Only offset 3 spans two elements for a `u16` load/store. Route the other unaligned offsets through the existing single-element logic so we don't spuriously touch `addr + 1` at the end of memory.
Add regression cases for byte offsets 1 and 2 in the integration suite, and add emitter-level tests that exercise unaligned `load_imm` and `store_imm` for `u16` addresses.
Cover the aligned byte-copy fast path and a case where only `count` is misaligned so the fast-path predicate is regression-tested as well.
16a7669 to
4a529fd
Compare
Close #1003
Summary
memcpy/memsetand add regression coverage for aligned and unaligned byte copies/sets, zero-length operations, and unalignedu16/i16memory accessesmemcpyfast paths so they only convert byte pointers to element addresses when the inputs are word-alignedoffset == 3through the split-word intrinsics while preserving the existing within-element path foroffset <= 2I suggest reviewing on a per-commit basis skipping non-interesting commits (refactors, etc.).