-
Notifications
You must be signed in to change notification settings - Fork 11
Emergency commit #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: compatible
Are you sure you want to change the base?
Emergency commit #353
Conversation
d60163f to
86a0934
Compare
86a0934 to
548fb07
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an important emergency commit mechanism to handle sequencer downtime, which is a crucial feature for the network's liveness. The implementation is well-structured, especially the refactoring in rule_commit.ml to accommodate the new emergency logic while reusing existing components. The addition of the count_commits folder and the Verify_emergency_folders wrapper to handle multiple recursive proofs is a good use of the existing patterns. The documentation and spec have also been updated to reflect these changes.
I've found a few minor issues, mostly in the specification document regarding pseudocode correctness and clarity, and a suggestion to improve configuration readability by avoiding a magic number. Overall, this is a solid contribution.
Expanding on action state limitationNot really an issue, I just want to expand on this comment on the PR description:
And in the spec:
Here's my understanding of the issue: The emergency commit rule basically wants to say "this is the action state at the current timestamp", where the timestamp is asserted via the valid-while precondition, and the action state via the action state precondition. However, making that statement is not exactly possible, because the action state can refer to any of the last 5 times it was updated, with no guaranteed relationship to the current timestamp. So in the worst case, the emergency commit can be exercised more easily than it should: For example, when less than 5 blocks with actions got posted onchain for about a month, that would allow us to claim "no commit for a month" even if not true, since we could use the same action state, referring to a commit from a month back, for both the start and the end of the
What would be bad about this? It would let random people override the sequencer's view of the L2 state. So it's in the interest of a healthy sequencer to prevent that. Fortunately, it's easy for a healthy sequencer to prevent: doing more than 5 commits per month is sufficient. (Alternatively, they could also make sure enough actions other than commits are posted.) If the sequencer is not healthy, and does not manage to post 5 commits in a month, we're already in the scenario that the emergency commit rule is designed for. So there's no real drawback. It's just that the condition for emergency commits to be possible is actually slightly weaker than "no commit happened in a month" (but strong enough to only apply to the intended scenario). |
| In effect, the emergency commit seals the gap with a bounded slot | ||
| range, restores liveness, and lets subsequent sequencers resume | ||
| committing under the usual rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this accurate? It's not clear to me how liveness would be restored. It seems to me that there is no mechanism to update the sequencer on the outer account, so how would we resume committing if that sequencer stays offline? Since a normal commit requires sequencer signature.
In the absence of coordination, users might only be able to exit at a rate of 1 per 30 days.Assuming that the only sequencer is gone, we must also assume that there no longer is a central place where attempted transactions are recorded. So how would a user that wants to exit proceed? They could take the last known ledger before the sequencer stopped operating (I'm assuming they can get this information via the DA layer). They could apply their own exit transaction(s) to that ledger, e.g. deposit into the L2 bridge. They would then need confirmation (signature) from the DA layer for that updated ledger. And finally, they would run the emergency commit rule, which would allow them to get their money out. If a single user does the above, without coordination to include more exit transactions than his own, then that would lock the emergency commit rule for another 30 days (since it results in a new commit.) There's a cartoon version of this where we have (say) a 1000 users and they take 1000 months (~80 years) to all exit because they do it one at a time :D Obviously, that's not what would happen: We would likely see coordination, to come up with a new effective sequencer, that allows all users to post their exits and builds a ledger that includes all of them. However, there is no mechanism that guarantees such a coordinated effort to succeed in being the first to manage to exit. They would still race with any rogue/uncoordinated attempts to be the ones to exit after 30 days. So there is a real danger that it could take multiple 30-day periods for the mechanism to be effective for most users. I'm not sure how bad you think this is in practice. I think it could be addressed easily: An emergency commit could toggle the contract to be in "emergency mode". From that point on, you would no longer enforce the 30 day minimum timespan, so after the first emergency commit everyone could exit quickly. |
This PR covers sequencer liveness, but what about DA layer liveness?We didn't audit the DA layer last time, but it seems to us that it's a multisig where signatures from all participants are required to make any L1 commit. It seems that liveness would be entirely broken if just one of those keys were lost or one of the nodes would stop operating. This looks like an even more problematic failure case than that of a dead sequencer, and it isn't addressed by the emergence commit mechanism. |
Until we have decentralized sequencing, we need some kind of mechanism to do emergency commit in case the sequencer goes offline. This PR implements the simple commit in case there has been no commits for some period of time.
The original commit rule stays the same.
The new emergency commit rule reuses the original rule, and checks that there has been no commit for max_sequencer_inactivity slots.
The main drawback of this approach is that the outer action state precondition can be set to the last 5 values, therefore the sequencer has to maintain commits in 5 distinct slots in the max_sequencer_inactivity window.
More detailed explanation is in the spec and rollup explanation doc.
Follwing is left to do: