sui_executor: cache gas ObjectRef from prev tx effects to avoid versi…#1709
Open
ycscaly wants to merge 3 commits into
Open
sui_executor: cache gas ObjectRef from prev tx effects to avoid versi…#1709ycscaly wants to merge 3 commits into
ycscaly wants to merge 3 commits into
Conversation
…on-stale rejection loop The executor used to fetch the gas Coin<SUI> ObjectRef from `get_gas_objects` (a JSON-RPC indexer query) for every submission. The indexer lags consensus by ~1 version after each tx the executor sends, so the *next* submission was almost always signed with a stale ref and rejected by Sui with: Transaction needs to be rebuilt because object 0x<gas-coin> version 0xN is unavailable for consumption, current version: 0xN+1 The `retry_with_max_elapsed_time!` wrapper would keep re-running the calling function, but each retry hit the same lag, so the executor could stall (60+ retries, then long silence; epoch never advanced past 1->2). Fix: serialize all submissions through a single mutex (`NotifierState`), fetch the gas ref once on cold start, and on each successful tx derive the post-execution gas ref directly from `tx_response.effects.gas_object().reference` — authoritative consensus-level data, no indexer round-trip. Errors invalidate the cache so the next attempt re-reads. After the fix, five consecutive epoch transitions completed with zero "object version unavailable" warnings, and the SDK dwallet-creation zero-trust DKG cases that previously timed out (Secp256r1, Ed25519) now pass cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SuiDataReceivers::clone() was calling broadcast::Receiver::resubscribe() for `new_requests_receiver`. Per tokio docs, resubscribe creates a fresh receiver that only delivers messages produced AFTER the subscribe call. The validator spawns a new dwallet_mpc_service per epoch. During the transition (old service shut down, new service spawning), the executor calls process_mid_epoch which emits a DWalletEncryptionKeyReconfiguration event on chain. sui_syncer observes and broadcasts it, but if this happens before the new service has resubscribed, the message lands on the channel with no active receiver and is lost. The new service's resubscribed Receiver starts from "now" and never sees the event. On-chain `system.started_sessions_count` increments (from the Move initiate_system_session call) but `system.completed_sessions_count` never catches up because no validator picks the event up to compute it. That breaks the end-of-publish gate in sync_dwallet_end_of_publish: `system.started == system.completed` stays false forever, so end_of_publish is never sent to chain, `received_end_of_publish` stays false, and process_request_advance_epoch never fires again. The chain wedges. Internal MPC keeps running (presigns flow), masking the freeze in the executor's loop. Fix: wrap the receiver in Arc<Mutex<broadcast::Receiver<_>>>. All per-epoch consumers share the same receiver state instead of resubscribing. Events buffered in the channel survive the transition, preserving read position across epochs. Verified: chain transitioned cleanly through epochs 1->7 (and beyond) where every prior run wedged at 6->7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on dev The chain wedges at an epoch transition under SDK load. Validators receive the mid-epoch reconfig event but it stays in requests_pending_for_network_key forever because the consensus-voted instantiate_agreed_keys_from_voted_data never produces a key_id to drain on. Main does not have this failure mode because handle_mpc_request_batch calls maybe_update_network_keys directly from the Sui watch channel — no consensus indirection. Status Voting (#1627) and the consensus-message split (#1672) replaced that direct path with the agreed-vote path, which is the architectural choice that makes dev wedge-prone here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
i think this PR will not be needed or can be reshaped once balance accumulators are open in sui mainnet, you will be able to pay gas through the accumulator and not via coins, thus not needing to manage gas coins or sponsorship |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.