Skip to content

sui_executor: cache gas ObjectRef from prev tx effects to avoid versi…#1709

Open
ycscaly wants to merge 3 commits into
devfrom
fix/sessions-manager-completion-equality
Open

sui_executor: cache gas ObjectRef from prev tx effects to avoid versi…#1709
ycscaly wants to merge 3 commits into
devfrom
fix/sessions-manager-completion-equality

Conversation

@ycscaly

@ycscaly ycscaly commented May 18, 2026

Copy link
Copy Markdown
Contributor

No description provided.

…on-stale rejection loop

The executor used to fetch the gas Coin<SUI> ObjectRef from
`get_gas_objects` (a JSON-RPC indexer query) for every submission. The
indexer lags consensus by ~1 version after each tx the executor sends,
so the *next* submission was almost always signed with a stale ref and
rejected by Sui with:

  Transaction needs to be rebuilt because object 0x<gas-coin> version 0xN
  is unavailable for consumption, current version: 0xN+1

The `retry_with_max_elapsed_time!` wrapper would keep re-running the
calling function, but each retry hit the same lag, so the executor could
stall (60+ retries, then long silence; epoch never advanced past 1->2).

Fix: serialize all submissions through a single mutex (`NotifierState`),
fetch the gas ref once on cold start, and on each successful tx derive
the post-execution gas ref directly from
`tx_response.effects.gas_object().reference` — authoritative
consensus-level data, no indexer round-trip. Errors invalidate the cache
so the next attempt re-reads.

After the fix, five consecutive epoch transitions completed with zero
"object version unavailable" warnings, and the SDK dwallet-creation
zero-trust DKG cases that previously timed out (Secp256r1, Ed25519) now
pass cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ycscaly and others added 2 commits May 18, 2026 12:55
SuiDataReceivers::clone() was calling broadcast::Receiver::resubscribe()
for `new_requests_receiver`. Per tokio docs, resubscribe creates a fresh
receiver that only delivers messages produced AFTER the subscribe call.

The validator spawns a new dwallet_mpc_service per epoch. During the
transition (old service shut down, new service spawning), the executor
calls process_mid_epoch which emits a DWalletEncryptionKeyReconfiguration
event on chain. sui_syncer observes and broadcasts it, but if this
happens before the new service has resubscribed, the message lands on
the channel with no active receiver and is lost. The new service's
resubscribed Receiver starts from "now" and never sees the event.

On-chain `system.started_sessions_count` increments (from the Move
initiate_system_session call) but `system.completed_sessions_count`
never catches up because no validator picks the event up to compute it.
That breaks the end-of-publish gate in sync_dwallet_end_of_publish:
`system.started == system.completed` stays false forever, so
end_of_publish is never sent to chain, `received_end_of_publish` stays
false, and process_request_advance_epoch never fires again. The chain
wedges. Internal MPC keeps running (presigns flow), masking the freeze
in the executor's loop.

Fix: wrap the receiver in Arc<Mutex<broadcast::Receiver<_>>>. All
per-epoch consumers share the same receiver state instead of
resubscribing. Events buffered in the channel survive the transition,
preserving read position across epochs.

Verified: chain transitioned cleanly through epochs 1->7 (and beyond)
where every prior run wedged at 6->7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on dev

The chain wedges at an epoch transition under SDK load. Validators receive
the mid-epoch reconfig event but it stays in requests_pending_for_network_key
forever because the consensus-voted instantiate_agreed_keys_from_voted_data
never produces a key_id to drain on.

Main does not have this failure mode because handle_mpc_request_batch calls
maybe_update_network_keys directly from the Sui watch channel — no consensus
indirection.

Status Voting (#1627) and the consensus-message split (#1672) replaced that
direct path with the agreed-vote path, which is the architectural choice that
makes dev wedge-prone here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ycscaly ycscaly requested a review from Iamknownasfesal as a code owner May 18, 2026 16:31
@Iamknownasfesal

Copy link
Copy Markdown
Contributor

i think this PR will not be needed or can be reshaped once balance accumulators are open in sui mainnet, you will be able to pay gas through the accumulator and not via coins, thus not needing to manage gas coins or sponsorship

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants