Skip to content

Conversation

@otrho
Copy link

@otrho otrho commented Dec 8, 2025

Re: #2383. This is a start at adding u128 maths support with add, sub and mul in both overflowing and wrapping versions.

The cycle counts are estimates - is there a way to formally get the cycle cost for a proc, or alternatively are there docs somewhere explaining when a dup or movup will cost 1, 2 or 3 cycles? I assume it's related to the distance from the top somehow.

The overflowing_mul implementation is pretty conservative and could probably be pared down a little.

In particular any time there is an overflow into the 129th bit or higher the value is coverted to a flag/bool and they are all OR'd together at the end (except add3 can create a '2' overflow, so they're summed and then converted again to a bool).

There are quite likely cases were we know definitely there can either be no overflow, or the overflow is already a bool and doesn't need conversion. But for now it's passing the tests and works, we can optimise as we go.

@otrho otrho requested a review from bobbinth December 8, 2025 04:31
@otrho otrho self-assigned this Dec 8, 2025
@otrho otrho added the stdlib Related to Miden standard library label Dec 8, 2025
@otrho otrho force-pushed the otrho/stdlib-u128 branch 2 times, most recently from 48a943c to 3999a75 Compare December 10, 2025 22:33
@otrho otrho requested a review from bitwalker December 10, 2025 22:33
@otrho otrho marked this pull request as ready for review December 10, 2025 22:59
@otrho otrho force-pushed the otrho/stdlib-u128 branch 3 times, most recently from ded8dfb to a5ca6c3 Compare December 17, 2025 02:59
@otrho otrho force-pushed the otrho/stdlib-u128 branch from a5ca6c3 to 8f91f76 Compare December 23, 2025 22:52
- `u128::overflowing_add` and `u128::wrapping_add`.
- `u128::overflowing_sub` and `u128::wrapping_sub`.
- `u128::overflowing_mul` and `u128::wrapping_mul`.
@otrho otrho force-pushed the otrho/stdlib-u128 branch from 8f91f76 to fa7d600 Compare December 23, 2025 23:01
#! The input values are assumed to be represented using 32 bit limbs, but this is not checked.
#! Stack transition is as follows:
#! [bhh, bmh, bml, bll, ahh, amh, aml, all, ...] -> [chh, cmh, cml, cll, ...], where c = (a - b) % 2^128
pub proc wrapping_sub(b: u128, a: u128) -> (u32, u128)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type here says (u32, u128) but the procedure only produces 4 stack elements (a u128). Looks like a copy-paste from overflowing_sub. Should this be -> u128 to match wrapping_add?

# -------------------------------------------
# f rhh rmh rml rll

pub proc overflowing_mul(b: u128, a: u128) -> (u64, u128)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type says u64 for the overflow flag, but the u64 module uses i1 for overflow flags (see u64::overflowing_mul). The actual value here is just 0 or 1 from the final push.0 neq. Should this be (u32, u128) or (i1, u128) for consistency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure the signature in the u64::overflowing_mul is correct - or at least it conflicts with the doc comments for the same procedure. Specifically, I think it returns (u64, u64) rather than (i1, u64).

If that's correct, could we also implement u128::overflowing_mul to look like so?:

pub proc overflowing_mul(b: u128, a: u128) -> (u128, u128)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub proc overflowing_mul(b: u128, a: u128) -> (u128, u128) would imply 8 felts being left on the stack IMO, which is obviously not quite right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe the naming here may be confusing. What I meant is that semantically, our u64::overflowing_mul functions like a widening multiplication (e..g, multiplying two u64 values gives us a 128-bit result). Similarly, would be great if this also behaved like a widening multiplication - i.e., multiplying two u128 values gives us a u256 result (which requires 8 felts to represent).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobbinth Those are unusual semantics IMO - overflowing_mul in most contexts I've seen (including Rust), treat it as an operation that returns the result as the same type as the operands, but with an additional flag that indicates whether overflow occurred, in which case the result value itself has been wrapped (giving you the overflow from T::MAX).

To get the resulting "widened" value, you'd simply cast the T::MAX value and the overflow to the widened type, and add them together. That addition is guaranteed to fit in the widened type, so doesn't need to be checked.

If you want a single function that does the widening multiplication, I think it should be added in addition to overflowing_mul, e.g. carrying_mul from Rust. And build it on top of the overflowing_mul implementation (or perhaps, build overflowing_mul in terms of carrying_mul, whichever is most efficient).

IMO, people rarely want the widened value when using overflowing_mul, instead they want to catch overflow and treat it as an non-panicking error. It is certainly useful on its own.

The compiler currently uses overflowing_mul as I described above, and I'd assume most people would assume it works the way I outlined, so if the actual implementation is different, we definitely should fix that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed that overflowing_mul is the right name for the semantics of this procedure. But the issue is that we use overflowing_mul to mean widening_mul (not carrying_mul) in other places (Rust also has widening_mul though, currently as nightly-only).

So, in a way, this PR introduces an inconsistency because we'll get:

  • For u32 and u64 overflowing_mul will mean widening multiplication.
  • For u128 it'll mean overflowing multiplication.

I think maybe a simple solution here is to rename u32 and u64 procedure/operations to widening_mul - but that would need to be done in a different PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #2558 for this.

Copy link
Collaborator

@bitwalker bitwalker Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to 2558

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. How are we actually using the current semantics throughout existing MASM code? I know that there must be a lot of bugs as a result of that, because the compiler for sure treats that overflow value as a boolean, not a u32 value (and I'm just about 100% sure my interpretation was based on the actual Miden docs at some point, though perhaps the docs were fixed and I didn't catch that). Our compiler intrinsics that use u32overlowing_mul also appear to all be written with this assumption. We probably need to audit how many uses of u32overflowing_mul are making the same incorrect assumption I did, because I assume I'm not alone there.

All uses I've seen assume the "widening" semantics, but of course, I haven't seen them all. In many cases, using the upper bits as a boolean value could be fine - i.e., 0 still means there is no overflow. But yes, if we are checking it to be 1 to indicate that there was overflow, that'll be a bug.

  1. The behavior of u32overflowing_mul is not consistent with u32overflowing_add (which has the correct semantics).

For addition, there is no difference between "overflow" and "widening" because even in case of widening you still get either 0 or 1 in the upper bits of the widened value. So, I think I was just trying to keep the naming consistent across addition and multiplication.

Since the u64 library is not being currently used outside the compiler (unless that has changed very recently), I think we can just fix the semantics of u64::overflowing_mul to be correct, and add a new u64::widening_mul function.

Agreed - and that's what I suggested in #2558.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For addition, there is no difference between "overflow" and "widening" because even in case of widening you still get either 0 or 1 in the upper bits of the widened value. So, I think I was just trying to keep the naming consistent across addition and multiplication.

The way the documentation treats it, the output on top of the stack is a boolean (which is effectively true, but misleading), while u32overflowing_mul shows that the output on top of the stack, i.e. [d, c] is $c←(a⋅b) \mod 2^{32}$ and $d←(a⋅b)/2^{32}$, which is unambiguous.

A quick immediate fix is to update the docs of u32overflowing_add to match u32overflowing_mul, so that one doesn't read the docs for u32overflowing_add and then assume the semantics of u32overflowing_mul are the same, just for multiplication. That's likely the mistake I made.

Copy link
Collaborator

@bitwalker bitwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from @huitseeker's comments, I think this looks good. We'll want to implement division in a subsequent PR, and we may need to make layout changes to both u64 and u128 in a future change, but neither of those need to happen here, so 👍 from me.

Overflowing ops now return `(i1, u128)` and fix a typo for `wrapping_sub`.
@Dominik1999
Copy link
Contributor

Can we still merge this?

@bitwalker
Copy link
Collaborator

@Dominik1999 Yes, though I should note that it will need to be reworked as part of the effort to make all the integer types that have multi-word layouts (e.g. u64 and u128) use a little-endian limb order (both currently use a big-endian layout). See #2564 for conversation around that.

For now though, I think we can go ahead and merge this, what do you think @bobbinth?

dup.7 dup.4
u32overflowing_mul # [o0 rll bhh bmh bml bll ahh amh aml all ...]

# aml x bml + o0 -> rml0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny typo: this comment says aml x bml but should be aml x bll to match the equivalent comment in overflowing_mul (line 125). The code itself looks correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stdlib Related to Miden standard library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants