-
Notifications
You must be signed in to change notification settings - Fork 250
Add initial math::u128 functions for core/math runtime.
#2438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
Conversation
48a943c to
3999a75
Compare
ded8dfb to
a5ca6c3
Compare
a5ca6c3 to
8f91f76
Compare
- `u128::overflowing_add` and `u128::wrapping_add`. - `u128::overflowing_sub` and `u128::wrapping_sub`. - `u128::overflowing_mul` and `u128::wrapping_mul`.
8f91f76 to
fa7d600
Compare
crates/lib/core/asm/math/u128.masm
Outdated
| #! The input values are assumed to be represented using 32 bit limbs, but this is not checked. | ||
| #! Stack transition is as follows: | ||
| #! [bhh, bmh, bml, bll, ahh, amh, aml, all, ...] -> [chh, cmh, cml, cll, ...], where c = (a - b) % 2^128 | ||
| pub proc wrapping_sub(b: u128, a: u128) -> (u32, u128) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type here says (u32, u128) but the procedure only produces 4 stack elements (a u128). Looks like a copy-paste from overflowing_sub. Should this be -> u128 to match wrapping_add?
crates/lib/core/asm/math/u128.masm
Outdated
| # ------------------------------------------- | ||
| # f rhh rmh rml rll | ||
|
|
||
| pub proc overflowing_mul(b: u128, a: u128) -> (u64, u128) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type says u64 for the overflow flag, but the u64 module uses i1 for overflow flags (see u64::overflowing_mul). The actual value here is just 0 or 1 from the final push.0 neq. Should this be (u32, u128) or (i1, u128) for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually not sure the signature in the u64::overflowing_mul is correct - or at least it conflicts with the doc comments for the same procedure. Specifically, I think it returns (u64, u64) rather than (i1, u64).
If that's correct, could we also implement u128::overflowing_mul to look like so?:
pub proc overflowing_mul(b: u128, a: u128) -> (u128, u128)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub proc overflowing_mul(b: u128, a: u128) -> (u128, u128) would imply 8 felts being left on the stack IMO, which is obviously not quite right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe the naming here may be confusing. What I meant is that semantically, our u64::overflowing_mul functions like a widening multiplication (e..g, multiplying two u64 values gives us a 128-bit result). Similarly, would be great if this also behaved like a widening multiplication - i.e., multiplying two u128 values gives us a u256 result (which requires 8 felts to represent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bobbinth Those are unusual semantics IMO - overflowing_mul in most contexts I've seen (including Rust), treat it as an operation that returns the result as the same type as the operands, but with an additional flag that indicates whether overflow occurred, in which case the result value itself has been wrapped (giving you the overflow from T::MAX).
To get the resulting "widened" value, you'd simply cast the T::MAX value and the overflow to the widened type, and add them together. That addition is guaranteed to fit in the widened type, so doesn't need to be checked.
If you want a single function that does the widening multiplication, I think it should be added in addition to overflowing_mul, e.g. carrying_mul from Rust. And build it on top of the overflowing_mul implementation (or perhaps, build overflowing_mul in terms of carrying_mul, whichever is most efficient).
IMO, people rarely want the widened value when using overflowing_mul, instead they want to catch overflow and treat it as an non-panicking error. It is certainly useful on its own.
The compiler currently uses overflowing_mul as I described above, and I'd assume most people would assume it works the way I outlined, so if the actual implementation is different, we definitely should fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed that overflowing_mul is the right name for the semantics of this procedure. But the issue is that we use overflowing_mul to mean widening_mul (not carrying_mul) in other places (Rust also has widening_mul though, currently as nightly-only).
So, in a way, this PR introduces an inconsistency because we'll get:
- For
u32andu64overflowing_mulwill mean widening multiplication. - For
u128it'll mean overflowing multiplication.
I think maybe a simple solution here is to rename u32 and u64 procedure/operations to widening_mul - but that would need to be done in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #2558 for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to 2558
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- How are we actually using the current semantics throughout existing MASM code? I know that there must be a lot of bugs as a result of that, because the compiler for sure treats that overflow value as a boolean, not a u32 value (and I'm just about 100% sure my interpretation was based on the actual Miden docs at some point, though perhaps the docs were fixed and I didn't catch that). Our compiler intrinsics that use
u32overlowing_mulalso appear to all be written with this assumption. We probably need to audit how many uses ofu32overflowing_mulare making the same incorrect assumption I did, because I assume I'm not alone there.
All uses I've seen assume the "widening" semantics, but of course, I haven't seen them all. In many cases, using the upper bits as a boolean value could be fine - i.e., 0 still means there is no overflow. But yes, if we are checking it to be 1 to indicate that there was overflow, that'll be a bug.
- The behavior of
u32overflowing_mulis not consistent withu32overflowing_add(which has the correct semantics).
For addition, there is no difference between "overflow" and "widening" because even in case of widening you still get either 0 or 1 in the upper bits of the widened value. So, I think I was just trying to keep the naming consistent across addition and multiplication.
Since the
u64library is not being currently used outside the compiler (unless that has changed very recently), I think we can just fix the semantics ofu64::overflowing_multo be correct, and add a newu64::widening_mulfunction.
Agreed - and that's what I suggested in #2558.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For addition, there is no difference between "overflow" and "widening" because even in case of widening you still get either 0 or 1 in the upper bits of the widened value. So, I think I was just trying to keep the naming consistent across addition and multiplication.
The way the documentation treats it, the output on top of the stack is a boolean (which is effectively true, but misleading), while u32overflowing_mul shows that the output on top of the stack, i.e. [d, c] is
A quick immediate fix is to update the docs of u32overflowing_add to match u32overflowing_mul, so that one doesn't read the docs for u32overflowing_add and then assume the semantics of u32overflowing_mul are the same, just for multiplication. That's likely the mistake I made.
bitwalker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from @huitseeker's comments, I think this looks good. We'll want to implement division in a subsequent PR, and we may need to make layout changes to both u64 and u128 in a future change, but neither of those need to happen here, so 👍 from me.
Overflowing ops now return `(i1, u128)` and fix a typo for `wrapping_sub`.
8a3d93a to
d284eef
Compare
|
Can we still merge this? |
|
@Dominik1999 Yes, though I should note that it will need to be reworked as part of the effort to make all the integer types that have multi-word layouts (e.g. For now though, I think we can go ahead and merge this, what do you think @bobbinth? |
| dup.7 dup.4 | ||
| u32overflowing_mul # [o0 rll bhh bmh bml bll ahh amh aml all ...] | ||
|
|
||
| # aml x bml + o0 -> rml0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny typo: this comment says aml x bml but should be aml x bll to match the equivalent comment in overflowing_mul (line 125). The code itself looks correct.
Re: #2383. This is a start at adding
u128maths support withadd,subandmulin both overflowing and wrapping versions.The cycle counts are estimates - is there a way to formally get the cycle cost for a proc, or alternatively are there docs somewhere explaining when a
dupormovupwill cost 1, 2 or 3 cycles? I assume it's related to the distance from the top somehow.The
overflowing_mulimplementation is pretty conservative and could probably be pared down a little.In particular any time there is an overflow into the 129th bit or higher the value is coverted to a flag/bool and they are all OR'd together at the end (except
add3can create a '2' overflow, so they're summed and then converted again to a bool).There are quite likely cases were we know definitely there can either be no overflow, or the overflow is already a bool and doesn't need conversion. But for now it's passing the tests and works, we can optimise as we go.