-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pulley: Implement more of loads/stores #9775
pulley: Implement more of loads/stores #9775
Conversation
/// * Instructions operate on either 32 or 64-bit parts of a register. | ||
/// Instructions modifying only 32-bits of a register always modify the "low" | ||
/// part of a register and leave the upper part unmodified. This is intended | ||
/// to help 32-bit platforms where if most operations are 32-bit there's no | ||
/// need for extra instructions to sign or zero extend and modify the upper | ||
/// half of the register. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious on your thoughts on this in particular @fitzgen.
I'll note that we don't currently do this well, e.g. xconst8
always writes the full 64-bits and br_if
has no variant that only reads 32-bits. I think we'll want to change that over time though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For xconst*
I think filling the whole register is what we want regardless. They are a little different from, say, bitwise arithmetic since they aren't really an operation themselves and are more about getting data in place to perform actual operations.
Although IIRC we are still zero-extending rather than sign-extending in const*
. The latter is going to save us code size, since otherwise all negative immediates will need to be loaded in registers with xconst64
s. But maybe we did fix this already at one point...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I feel is a bit entangled with the question below of "do we want xload8_s32_offset32
". I'm a bit worried about xconst8
always filling the entire register mostly from the perspective of 32-bit platforms. If most constants in a program are actually 32-bits then we're in theory spending a lot of instructions to fill the upper half of the register that's never read. Cache-wise that's probably not too problematic but it's possibly problematic from an icache perspective of the implementation of xconst8
itself.
I definitely agree though that for 64-bit platforms it doesn't matter at all.
For now I'm going to go ahead with this PR as-is (with xload8_s32_offset32
for example) and these docs since we don't have a great answer one way or another and I think we're both on the fence. I'm going to add some notes to #9747 to come back to reevaluate though.
At a high level the |
/// * Instructions operate on either 32 or 64-bit parts of a register. | ||
/// Instructions modifying only 32-bits of a register always modify the "low" | ||
/// part of a register and leave the upper part unmodified. This is intended | ||
/// to help 32-bit platforms where if most operations are 32-bit there's no | ||
/// need for extra instructions to sign or zero extend and modify the upper | ||
/// half of the register. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For xconst*
I think filling the whole register is what we want regardless. They are a little different from, say, bitwise arithmetic since they aren't really an operation themselves and are more about getting data in place to perform actual operations.
Although IIRC we are still zero-extending rather than sign-extending in const*
. The latter is going to save us code size, since otherwise all negative immediates will need to be loaded in registers with xconst64
s. But maybe we did fix this already at one point...
/// `low32(dst) = sext(*(ptr + offset))` | ||
xload8_s32_offset32 = XLoad8S32Offset32 { dst: XReg, ptr: XReg, offset: i32 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are (rightfully) burning a lot of opcode space for loads and stores, but I wonder if we can make an exception for the "only write the bottom bits" and always extend to the full register width? This would mean we wouldn't need both xload8_s32_offset32
and xload8_s64_offset32
and would effectively have only the latter. This would roughly cut in half the number of load/store instructions we have but wouldn't constrain the cranelift backend at all. The cost would be a couple extra extends on 32-bit platforms, which I don't think is too bad a trade off in this particular case.
And FWIW, I still think the default position should be to only write the low bits of registers when doing 32-bit adds/ands/xors/etc, we are just running into all the edge cases now :)
This commit gets the `address.wast` spec test working by filling out more load/store infrastructure in Pulley. In doing so I've done a refactoring of the existing load/store methods. Changes here are: * All load/stores are `*_offset32` now instead of optionally having no offset, a 8-bit offset, or a 64-bit offset. * All x-register loads/stores are prefixed with `x` now. * All loads/stores have "le" for little-endian in their name. * Loads/stores are refactored to have 8 and 16-bit variants. * Sign-extending loads now either extend to 32 or 64 depending on the opcode. * Float loads/stores are added. * Big-endian is handled with explicit big-endian loads/stores instead `bswap` to handle this transparently in the backend (e.g. for stores not ISLE-generated) and to handle floats.
This is a bit onerous to keep updated and is probably best subsumed by the fuzzing support we have in general for wasm.
14354b9
to
93694d5
Compare
This commit gets the
address.wast
spec test working by filling out more load/store infrastructure in Pulley. In doing so I've done a refactoring of the existing load/store methods. Changes here are:*_offset32
now instead of optionally having no offset, a 8-bit offset, or a 64-bit offset.x
now.bswap
to handle this transparently in the backend (e.g. for stores not ISLE-generated) and to handle floats.