Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Winch: Implement rem for aarch64 #9781

Merged
merged 5 commits into from
Dec 11, 2024

Conversation

MarinPostma
Copy link
Contributor

@MarinPostma MarinPostma commented Dec 10, 2024

Implement remainder operation for aarch64 in winch.

one more step in the #8321 series

// we therefore sign-extend the operand.
// see: https://github.com/bytecodealliance/wasmtime/issues/9766
//
// FIXME: maybe this is only necessary for signed remainder?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saulecabrera I'm not sure about that, but if we are performing an unsigned remainder operation, then do we really care about sign extending? Maybe it's not necessary for division either? Is there a scenario where we can't just blindly consider the w register as a x register?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same reasonning applies to div, I think. We could shave a few instructions in the case of unsigned division too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, nothing is stopping us from passing a w register here, however given that we're dealing with the 64-bit instruction variant, it will take into account the full 64-bits not just the lower 32. In case that the high 32 bits are not zeroed out, this could lead to unexpected behavior. So we need to extend in both cases.

@MarinPostma MarinPostma marked this pull request as ready for review December 10, 2024 23:26
@MarinPostma MarinPostma requested review from a team as code owners December 10, 2024 23:26
@MarinPostma MarinPostma requested review from abrown and alexcrichton and removed request for a team December 10, 2024 23:26
@abrown abrown requested a review from saulecabrera December 11, 2024 00:58
@github-actions github-actions bot added the winch Winch issues or pull requests label Dec 11, 2024
Copy link

Subscribe to Label Action

cc @saulecabrera

This issue or pull request has been labeled: "winch"

Thus the following users have been cc'd because of the following labels:

  • saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Member

@saulecabrera saulecabrera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Left some comments inline.

As a general comment, given that the same invariant as division apply here, could we modify the existing div_rrr so that it can be re-used from rem_rrr?

// we therefore sign-extend the operand.
// see: https://github.com/bytecodealliance/wasmtime/issues/9766
//
// FIXME: maybe this is only necessary for signed remainder?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, nothing is stopping us from passing a w register here, however given that we're dealing with the 64-bit instruction variant, it will take into account the full 64-bits not just the lower 32. In case that the high 32 bits are not zeroed out, this could lead to unexpected behavior. So we need to extend in both cases.

self.emit(Inst::Extend {
rd: writable!(divisor.into()),
rn: divisor.into(),
signed: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #9762 was merged, I noticed that we were unconditionally emitting a sign-extension here. Instead we need to request the proper extension depending on the division kind. I opened #9784 to fix this. We should apply the same fix here.

size: OperandSize,
) {
// Check for division by 0
self.emit(Inst::TrapIf {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use self.trapz here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the same invariants as signed division apply here, right? i.e., we must still check for overflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was expecting it to be the case, but as it turn out, it doesn't seem necessary; the following snippet

 (module
     (func (param i32 i32) (result i32)
          local.get 0
           local.get 1
           i32.rem_u
     )
 )

is compiled by cranelift to:

0000000000000000 <wasm[0]::function[0]>:
       0: d503235f      pacibz
       4: a9bf7bfd      stp     x29, x30, [sp, #-16]!
       8: 910003fd      mov     x29, sp
       c: 2a0403e3      mov     w3, w4
      10: 2a0503e5      mov     w5, w5
      14: b40000c5      cbz     x5, 0x2c <wasm[0]::function[0]+0x2c>
      18: 9ac50868      udiv    x8, x3, x5
      1c: 9b058d02      msub    x2, x8, x5, x3
      20: a8c17bfd      ldp     x29, x30, [sp], #16
      24: d50323df      autibz
      28: d65f03c0      ret
      2c: 0000c11f      udf     #49439

(with rem_s:

       0: d503235f      pacibz
       4: a9bf7bfd      stp     x29, x30, [sp, #-16]!
       8: 910003fd      mov     x29, sp
       c: 52b00004      mov     w4, #-2147483648
      10: 93407c81      sxtw    x1, w4
      14: 92800003      mov     x3, #-1
      18: 9ac30c25      sdiv    x5, x1, x3
      1c: 9b0384a2      msub    x2, x5, x3, x1
      20: a8c17bfd      ldp     x29, x30, [sp], #16
      24: d50323df      autibz
      28: d65f03c0      ret

)

At first I thought this was a bug, but it seems to be consistent across platform:

(module
    (func (export "_start") (result i32)
	(i32.const 0x80000000)
	(i32.const -1)
        i32.rem_s
    )
)

this yields 0 on both aarch64 and x86.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, going over the Wasm spec, there's no explicit mention of overflow checks in this case.

@MarinPostma
Copy link
Contributor Author

I was my initial intention to factorize the check, but as it turns out, we don't need to check for overflow for rem. I'm working on supporting i32 division, so we can drop the extension in a subsequent PR too.

@saulecabrera saulecabrera added this pull request to the merge queue Dec 11, 2024
Merged via the queue into bytecodealliance:main with commit da41a55 Dec 11, 2024
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
winch Winch issues or pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants