Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pulley: Implement more of loads/stores #9775

Merged

Conversation

alexcrichton
Copy link
Member

This commit gets the address.wast spec test working by filling out more load/store infrastructure in Pulley. In doing so I've done a refactoring of the existing load/store methods. Changes here are:

  • All load/stores are *_offset32 now instead of optionally having no offset, a 8-bit offset, or a 64-bit offset.
  • All x-register loads/stores are prefixed with x now.
  • All loads/stores have "le" for little-endian in their name.
  • Loads/stores are refactored to have 8 and 16-bit variants.
  • Sign-extending loads now either extend to 32 or 64 depending on the opcode.
  • Float loads/stores are added.
  • Big-endian is handled with explicit big-endian loads/stores instead bswap to handle this transparently in the backend (e.g. for stores not ISLE-generated) and to handle floats.

@alexcrichton alexcrichton requested review from a team as code owners December 10, 2024 16:18
@alexcrichton alexcrichton requested review from fitzgen and removed request for a team December 10, 2024 16:18
Comment on lines +50 to +61
/// * Instructions operate on either 32 or 64-bit parts of a register.
/// Instructions modifying only 32-bits of a register always modify the "low"
/// part of a register and leave the upper part unmodified. This is intended
/// to help 32-bit platforms where if most operations are 32-bit there's no
/// need for extra instructions to sign or zero extend and modify the upper
/// half of the register.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious on your thoughts on this in particular @fitzgen.

I'll note that we don't currently do this well, e.g. xconst8 always writes the full 64-bits and br_if has no variant that only reads 32-bits. I think we'll want to change that over time though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For xconst* I think filling the whole register is what we want regardless. They are a little different from, say, bitwise arithmetic since they aren't really an operation themselves and are more about getting data in place to perform actual operations.

Although IIRC we are still zero-extending rather than sign-extending in const*. The latter is going to save us code size, since otherwise all negative immediates will need to be loaded in registers with xconst64s. But maybe we did fix this already at one point...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I feel is a bit entangled with the question below of "do we want xload8_s32_offset32". I'm a bit worried about xconst8 always filling the entire register mostly from the perspective of 32-bit platforms. If most constants in a program are actually 32-bits then we're in theory spending a lot of instructions to fill the upper half of the register that's never read. Cache-wise that's probably not too problematic but it's possibly problematic from an icache perspective of the implementation of xconst8 itself.

I definitely agree though that for 64-bit platforms it doesn't matter at all.

For now I'm going to go ahead with this PR as-is (with xload8_s32_offset32 for example) and these docs since we don't have a great answer one way or another and I think we're both on the fence. I'm going to add some notes to #9747 to come back to reevaluate though.

@alexcrichton
Copy link
Member Author

At a high level the spec_testsuite/*.wast still can't be executed, but that's the next PR after this to get that executing on CI (and a large number of tests already pass)

@alexcrichton alexcrichton requested a review from a team as a code owner December 10, 2024 16:26
pulley/src/lib.rs Outdated Show resolved Hide resolved
pulley/src/lib.rs Outdated Show resolved Hide resolved
pulley/src/lib.rs Show resolved Hide resolved
Comment on lines +50 to +61
/// * Instructions operate on either 32 or 64-bit parts of a register.
/// Instructions modifying only 32-bits of a register always modify the "low"
/// part of a register and leave the upper part unmodified. This is intended
/// to help 32-bit platforms where if most operations are 32-bit there's no
/// need for extra instructions to sign or zero extend and modify the upper
/// half of the register.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For xconst* I think filling the whole register is what we want regardless. They are a little different from, say, bitwise arithmetic since they aren't really an operation themselves and are more about getting data in place to perform actual operations.

Although IIRC we are still zero-extending rather than sign-extending in const*. The latter is going to save us code size, since otherwise all negative immediates will need to be loaded in registers with xconst64s. But maybe we did fix this already at one point...

Comment on lines +208 to +215
/// `low32(dst) = sext(*(ptr + offset))`
xload8_s32_offset32 = XLoad8S32Offset32 { dst: XReg, ptr: XReg, offset: i32 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are (rightfully) burning a lot of opcode space for loads and stores, but I wonder if we can make an exception for the "only write the bottom bits" and always extend to the full register width? This would mean we wouldn't need both xload8_s32_offset32 and xload8_s64_offset32 and would effectively have only the latter. This would roughly cut in half the number of load/store instructions we have but wouldn't constrain the cranelift backend at all. The cost would be a couple extra extends on 32-bit platforms, which I don't think is too bad a trade off in this particular case.

And FWIW, I still think the default position should be to only write the low bits of registers when doing 32-bit adds/ands/xors/etc, we are just running into all the edge cases now :)

pulley/src/interp.rs Outdated Show resolved Hide resolved
This commit gets the `address.wast` spec test working by filling out
more load/store infrastructure in Pulley. In doing so I've done a
refactoring of the existing load/store methods. Changes here are:

* All load/stores are `*_offset32` now instead of optionally having no
  offset, a 8-bit offset, or a 64-bit offset.
* All x-register loads/stores are prefixed with `x` now.
* All loads/stores have "le" for little-endian in their name.
* Loads/stores are refactored to have 8 and 16-bit variants.
* Sign-extending loads now either extend to 32 or 64 depending on the
  opcode.
* Float loads/stores are added.
* Big-endian is handled with explicit big-endian loads/stores instead
  `bswap` to handle this transparently in the backend (e.g. for stores
  not ISLE-generated) and to handle floats.
This is a bit onerous to keep updated and is probably best subsumed by
the fuzzing support we have in general for wasm.
@alexcrichton alexcrichton force-pushed the pulley-loads-and-stores branch from 14354b9 to 93694d5 Compare December 10, 2024 17:26
@alexcrichton alexcrichton added this pull request to the merge queue Dec 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 10, 2024
@alexcrichton alexcrichton added this pull request to the merge queue Dec 10, 2024
Merged via the queue into bytecodealliance:main with commit 23e8dce Dec 10, 2024
43 checks passed
@alexcrichton alexcrichton deleted the pulley-loads-and-stores branch December 10, 2024 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants