Skip to content

fix(dwallet-mpc): deterministic internal-presign session identifiers across validators#1733

Merged
ycscaly merged 1 commit into
devfrom
fix/internal-presign-sid-determinism
Jun 11, 2026
Merged

fix(dwallet-mpc): deterministic internal-presign session identifiers across validators#1733
ycscaly merged 1 commit into
devfrom
fix/internal-presign-sid-determinism

Conversation

@ycscaly

@ycscaly ycscaly commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

The bug

Internal presign sessions draw their sequence number from a single shared counter, assigned in iteration order over (network key id) × (curve) × (signature algorithm) — and that sequence number is bound into the session identifier transcript. Both iteration sources were unordered:

  • SUPPORTED_CURVES_TO_SIGNATURE_ALGORITHMS_TO_HASH_SCHEMES was a HashMap<u32, HashMap<u32, Vec<u32>>> — iteration order is random per process (RandomState), so each validator walked curves/algorithms in a different order;
  • the agreed network key ids were iterated straight off a HashMap.

Each validator therefore derived different session identifiers for the same (curve, algorithm) work. Those sessions can never reach quorum, never complete, and the instantiated≠completed gate then blocks epoch advance.

The fix

Make both iterations deterministic (BTreeMap / sorted ids) so every validator assigns the same sequence numbers to the same work, producing identical session identifiers.

Cherry-picked from feat/ika-upgrade-test (29e8a09), where the cross-binary churn and v1.1.8 upgrade rehearsals run green with it.

🤖 Generated with Claude Code

…dators

Internal presign sessions get their sequence number from a single shared
counter, assigned in iteration order over (network key id) x (curve) x
(signature algorithm), and the sequence number is bound into the session
identifier transcript. Both iteration sources were unordered:

- SUPPORTED_CURVES_TO_SIGNATURE_ALGORITHMS_TO_HASH_SCHEMES was a
  HashMap<u32, HashMap<u32, Vec<u32>>> — iteration order is random per
  process (RandomState), so each validator walked curves/algorithms in a
  different order;
- the agreed network key ids were iterated straight off a HashMap.

Each validator therefore derived *different* session identifiers for the
same (curve, algorithm) work. Those sessions could never reach quorum, so
they never completed, and the instantiated != completed gate then blocked
that algorithm's pool top-ups for the entire epoch. Once a user presign
request locked onto the starved pool, the EndOfPublish condition was
unsatisfiable and the epoch could not advance.

Observed live: in a 4-validator run the validators logged three distinct
top-up orders, and exactly the sequence numbers whose (curve, algorithm)
assignment happened to agree on 3+ validators completed — the rest hung
forever, the ECDSA pool stayed empty all epoch, and the run timed out.
A previous green run was a per-process-seed coin flip.

Fix: BTreeMap at both nesting levels of the static, and collect the
agreed key ids into a BTreeSet before the instantiation loop. Pre-existing
bug from the internal sessions instantiation logic (#1638), not specific
to this branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ycscaly ycscaly merged commit 17fc470 into dev Jun 11, 2026
10 checks passed
@ycscaly ycscaly deleted the fix/internal-presign-sid-determinism branch June 11, 2026 12:03
ycscaly added a commit that referenced this pull request Jun 11, 2026
# Conflicts:
#	crates/ika-core/src/dwallet_mpc/mpc_manager.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant