fix(dwallet-mpc): network-key adoption gates — no committee-unapproved parameter set may go live#1735
Merged
Merged
Conversation
…ttee didn't agree on
A validator that installs different network-key parameters than its
peers honestly computes byte-divergent MPC outputs and is convicted
malicious by the output-quorum byte-equality tally — observed live
("node recognized itself as malicious"), silently dropping a
4-validator committee to zero-redundancy 3-of-4 until a later loss
froze MPC entirely. Root: the epoch-entry stale-mpc_data race pairs
with two adoption holes:
- An overlay entry whose reconfiguration output is transiently EMPTY
slipped through the initial-DKG adoption branch while the prior
epoch's handoff cert pins a reconfiguration digest for the key,
instantiating DKG-derived parameters the committee never agreed to
run this epoch. Skip and retry until the cert-pinned bytes resolve.
- Adopted data whose current_epoch metadata mismatches the manager's
epoch was only rejected ~10s AFTER the parameter derivation burnt on
the rayon pool — and the doomed in-flight instantiation blocked the
same key's correct data, widening the entry key gap during which
sessions park. Reject before spawning.
Also un-silence the computation-spawn loop: a session whose protocol-
cryptographic-data generation fails was skipped every 20ms tick with no
log at any level, which blinded two wedge post-mortems. The skip stays
(correct); it now logs once per session.
Regression tests for both gates (cert-pinned-empty-overlay adoption and
stale-epoch pre-spawn rejection) with positive controls; spec'd in
handoff.md's certificate-consumption section.
Salvaged from an interrupted background agent's worktree; reviewed,
completed (module registration, spec), and validated: new tests 2/2,
beyond_lock_target 2/2, computation_results_batch 1/1,
missing_network_key 1/1, clippy clean on touched code.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
Author
|
Tracking issue for the residual race + the CI-only parked-sessions link: #1736 |
omersadika
added a commit
that referenced
this pull request
Jun 12, 2026
#4) The v4 pipeline's three designed halt/block modes are safety-first by design, so a blocked validator looks healthy from outside. The metrics all exist (verified against the registries); what was missing is the alerting contract: rules for ika_handoff_prepare_waiting (barrier blocked > 2x epoch) and off_chain_assembly_wedged (EverythingExcluded — the one mode with no self-heal), the log-based signal for the joiner bootstrap fail-closed halt, the operator action for each, and the secondary dashboard signals (pruner advancement, presign-queue drain, rejected handoff signatures). Also merges origin/dev (PRs #1734/#1735): git's rename detection carried the new adoption-guards section of specs/handoff.md into the relocated dev-docs/specs/handoff.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the false-malicious conviction failure mode surfaced during the epoch-close wedge investigation (see PR #1721's "known issue" note): a validator that installs a different network-key parameter set than its peers honestly computes byte-divergent MPC outputs, and the output-quorum byte-equality tally convicts it as malicious — observed live, including the node logging
node recognized itself as malicious. One conviction silently converts a 4-validator committee into zero-redundancy 3-of-4; the network keeps running degraded with no error-level signal until any further loss freezes MPC.Root context: at every epoch-N entry, early-starting validators transiently see stale/incomplete overlay data (the epoch-entry stale-mpc_data race). Two adoption holes let that transient state become an installed parameter set:
current_epochmetadata mismatches the manager's epoch was only rejected ~10s after the parameter derivation burnt on the rayon pool, and the doomed in-flight instantiation blocked the same key's correct data behind it, widening the entry key gap during which sessions park. Now rejected before spawning.Plus an observability fix that blinded two post-mortems: a session whose protocol-cryptographic-data generation fails was silently skipped every 20ms service tick (
.ok()?). The skip semantics stay; it now logs once per session (deduped).Tests
network_key_adoption.rs: both gates with positive controls — the cert-pinned empty/mismatching/matching overlay progression (self-validating: if the cert weren't consumed, the mismatch assertion would fail via the blind-adopt path), and stale-vs-current epoch spawn discrimination.beyond_lock_target2/2,computation_results_batch1/1,missing_network_key1/1; clippy clean on touched code.specs/handoff.mdcertificate-consumption section documents both guards and why divergence ends in conviction.What this does NOT fix (tracked separately)
The residual epoch-entry race (validators still fetch the prior epoch's overlay at entry) and the CI-only "parked sessions never compute after the key installs" link — see the tracking issue.
🤖 Generated with Claude Code