|
| 1 | +# FMA: U16a Maintenance Upgrade |
| 2 | + |
| 3 | +|---------------------|--------------------------| |
| 4 | +| Author | Kelvin Fichter | |
| 5 | +| Created at | YYYY-MM-DD | |
| 6 | +| Needs Approval From | Matt Solomon, Josep Bove | |
| 7 | +| Other Reviewers | | |
| 8 | +| Status | Implementing Actions | |
| 9 | + |
| 10 | +## Introduction |
| 11 | + |
| 12 | +U16a is a maintenance upgrade that replaces U16. Its primary purpose is to temporarily remove interop-specific code introduced in U16 while introducing foundational support for system-level feature toggles. |
| 13 | + |
| 14 | +The interop withdrawal-proving code paths are removed in U16a following feedback from chain operators and security reviewers. We intend to reintroduce interop functionality in a future upgrade after working more closely with partners' security teams. |
| 15 | + |
| 16 | +### Removal of interop withdrawal-proving code |
| 17 | + |
| 18 | +Chains can no longer prove withdrawals when interop is enabled. ETHLockbox support remains for chains that previously upgraded to U16. Chains skipping directly from U15 => U16a cannot adopt ETHLockbox until a later upgrade. |
| 19 | + |
| 20 | +### Introduction of system-level feature toggles |
| 21 | + |
| 22 | +The upgrade implements system-level feature toggles via the `SystemConfig` contract. ETHLockbox is the first feature gated behind this mechanism. Feature flags are string-based and settable by chain admins. |
| 23 | + |
| 24 | +### OPContractsManager (OPCM) updates |
| 25 | + |
| 26 | +The OPCM supports both `U15 => U16a` and `U16 => U16a` upgrade paths. It includes explicit protections against misuse of dev feature flags in production. Upgrade tests cover both paths, and rollout will be validated on betanet first. |
| 27 | + |
| 28 | +### Pause mechanism identifier change |
| 29 | + |
| 30 | +Chains without ETHLockbox will use the `OptimismPortal` address as their identifier instead of the lockbox address. Incident response runbooks will be updated accordingly. Chains retain `address(0)` as the global pause identifier in either case. |
| 31 | + |
| 32 | +## Failure Modes |
| 33 | + |
| 34 | +### FM1: Misconfigured Feature Toggles |
| 35 | + |
| 36 | +#### Description |
| 37 | + |
| 38 | +Chains may incorrectly configure the system-level feature toggles introduced in U16a. This includes enabling ETHLockbox flag without having the ETHLockbox contract deployed, failing to enable ETHLockbox when it's needed, or setting arbitrary unsupported feature flags. |
| 39 | + |
| 40 | +#### Risk Assessment |
| 41 | + |
| 42 | +- Impact: LOW/MEDIUM |
| 43 | + - Reasoning: Most misconfigurations have no functional effect or cause temporary bridge issues that can be resolved by admin intervention. The most impactful scenario is failing to enable ETHLockbox when needed, which temporarily breaks the bridge. |
| 44 | +- Likelihood: LOW/MEDIUM |
| 45 | + - Reasoning: Feature toggle system is new, increasing chance of operator error. However, validator checks and clear documentation should minimize risks. |
| 46 | + |
| 47 | +#### Mitigations |
| 48 | + |
| 49 | +- ETHLockbox can only be enabled if lockbox address is set (requires U16 upgrade path). |
| 50 | +- Admin can set toggle post-upgrade if initially misconfigured. |
| 51 | +- Clear operational documentation and runbooks. |
| 52 | + |
| 53 | +#### Detection |
| 54 | + |
| 55 | +- Monitor for portal/lockbox balances. |
| 56 | +- Monitor for unexpected feature flag changes. |
| 57 | + |
| 58 | +#### Action Items |
| 59 | + |
| 60 | +- [x] Update standard validator to verify lockbox/feature flag configuartion |
| 61 | +- [ ] Set up system feature flag monitor |
| 62 | +- [ ] Set up portal/lockbox balance monitor |
| 63 | + |
| 64 | +#### Recovery Path(s) |
| 65 | + |
| 66 | +- Correct misconfigured toggles via admin transactions. |
| 67 | +- No special recovery procedures beyond standard admin intervention. |
| 68 | + |
| 69 | +### FM2: ETHLockbox Migration Gap |
| 70 | + |
| 71 | +#### Description |
| 72 | + |
| 73 | +Chains upgrading from U15 directly to U16a cannot adopt ETHLockbox until a later upgrade. This creates operational complexity as some chains have ETHLockbox while others don't, leading to testing asymmetry and incident response divergence in pause mechanism runbooks. |
| 74 | + |
| 75 | +#### Risk Assessment |
| 76 | + |
| 77 | +- Impact: LOW |
| 78 | + - Reasoning: Operational overhead is minimal and doesn't affect core functionality or safety. Primarily creates complexity in testing and operational procedures. |
| 79 | +- Likelihood: HIGH |
| 80 | + - Reasoning: This is an intentional design decision that will definitely occur for chains following the U15 => U16a upgrade path. |
| 81 | + |
| 82 | +#### Mitigations |
| 83 | + |
| 84 | +- Testing covers both scenarios since major chains exist in both categories. |
| 85 | +- Runbooks and supporting documentation will be updated to prevent operator confusion. |
| 86 | +- Clear documentation of which chains have ETHLockbox and which don't. |
| 87 | + |
| 88 | +#### Detection |
| 89 | + |
| 90 | +- Track which chains follow which upgrade path. |
| 91 | +- Monitor for operator confusion in incident response procedures. |
| 92 | + |
| 93 | +#### Action Items |
| 94 | + |
| 95 | +- [ ] Update incident response runbooks to handle both ETHLockbox and non-ETHLockbox chains |
| 96 | +- [ ] Update pause tooling to handle both ETHLockbox and non-ETHLockbox chains |
| 97 | +- [x] Ensure test coverage includes both scenarios |
| 98 | +- [ ] Do a training session about this new state of affairs |
| 99 | + |
| 100 | +#### Recovery Path(s) |
| 101 | + |
| 102 | +- No recovery needed as this is expected behavior. |
| 103 | +- Chains can adopt ETHLockbox in future upgrades. |
| 104 | +- Operational procedures handle both pathways as designed. |
| 105 | + |
| 106 | +### FM3: OPContractsManager Dual-Path Upgrades |
| 107 | + |
| 108 | +#### Description |
| 109 | + |
| 110 | +The OPContractsManager supports both `U15 => U16a` and `U16 => U16a` upgrade paths, creating complexity in the upgrade logic. Issues could arise from incorrect branching between upgrade paths or accidental enabling of dev feature flags in production environments. |
| 111 | + |
| 112 | +#### Risk Assessment |
| 113 | + |
| 114 | +- Impact: MEDIUM/HIGH |
| 115 | + - Reasoning: Incorrect upgrade path branching could leave chains stuck mid-upgrade or misconfigured. Dev feature flags in production could expose interop code paths. |
| 116 | +- Likelihood: LOW |
| 117 | + - Reasoning: Dual-path upgrade logic is tested in CI and betanet. Dev feature flags have hardcoded protections preventing production use. |
| 118 | + |
| 119 | +#### Mitigations |
| 120 | + |
| 121 | +- Dual-path upgrade tested thoroughly in CI and betanet environments. |
| 122 | +- `verifyOPCM` function should verify no dev flags are enabled. |
| 123 | +- Hardcoded protections make it impossible to enable dev feature flags in production. |
| 124 | +- Clear upgrade procedures and checklists for each path. |
| 125 | + |
| 126 | +#### Detection |
| 127 | + |
| 128 | +- Upgrade test suite validates both paths. |
| 129 | +- Betanet validation before mainnet deployment. |
| 130 | + |
| 131 | +#### Action Items |
| 132 | + |
| 133 | +- [ ] Validate both upgrade paths on betanet |
| 134 | +- [ ] Have VerifyOPCM check for dev feature flags in production |
| 135 | + |
| 136 | +#### Recovery Path(s) |
| 137 | + |
| 138 | +- Manual intervention may be required to resolve stuck upgrades. |
| 139 | + |
| 140 | +### FM4: Removal of Interop Withdrawal-Proving Code |
| 141 | + |
| 142 | +#### Description |
| 143 | + |
| 144 | +U16a removes interop withdrawal-proving code paths that were introduced in U16. While no known chains or tooling currently depend on these code paths, there's a risk that some undiscovered dependencies exist. |
| 145 | + |
| 146 | +#### Risk Assessment |
| 147 | + |
| 148 | +- Impact: MEDIUM |
| 149 | + - Reasoning: If undiscovered dependencies exist, removal could break functionality for chains or tooling that relied on these paths. |
| 150 | +- Likelihood: LOW |
| 151 | + - Reasoning: No known dependencies have been identified, and partner communications will confirm this before rollout. |
| 152 | + |
| 153 | +#### Mitigations |
| 154 | + |
| 155 | +- Partner communications planned to confirm no dependencies before rollout. |
| 156 | +- Staged rollout allows for early detection of issues. |
| 157 | + |
| 158 | +#### Detection |
| 159 | + |
| 160 | +- Partner feedback during pre-rollout communications. |
| 161 | +- Community reports of broken functionality. |
| 162 | + |
| 163 | +#### Action Items |
| 164 | + |
| 165 | +- [ ] Complete partner communications to confirm no dependencies |
| 166 | +- [ ] Monitor for issues during staged rollout |
| 167 | + |
| 168 | +#### Recovery Path(s) |
| 169 | + |
| 170 | +- Revert to U16 if critical dependencies are discovered. |
| 171 | +- Develop alternative solutions for affected parties. |
| 172 | +- Delay full rollout until dependencies are resolved. |
| 173 | + |
| 174 | +### FM5: Tech Debt |
| 175 | + |
| 176 | +#### Description |
| 177 | + |
| 178 | +We're adding a number of new branches to the code as a result of both `ETHLockbox` and `OptimismPortalInterop`. If we don't start to resolve/simplify the logic by mainlining as much code as possible, we'll potentially create a non-trivial amount of tech debt that will need to be resolved eventually or could potentially create bugs for other features. |
| 179 | + |
| 180 | +#### Risk Assessment |
| 181 | + |
| 182 | +- Impact: UNCLEAR |
| 183 | + - Reasoning: Impact here is not obvious, it fully depends on what type of bug this could potentially create. It likely falls into low/medium but the long-term impact of tech debt on execution can be high. |
| 184 | +- Likelihood: MEDIUM |
| 185 | + - Reasoning: This is the first time we're introducing a feature like this, it's reasonably likely that it will create issues at some point. We'll definitely have to resolve the tech debt, so that's a 100% certainty, but the chance of creating a real issue is probably low/medium. |
| 186 | + |
| 187 | +#### Mitigations |
| 188 | + |
| 189 | +- Primary mitigation here is to start the project of resolving this tech debt as quickly as possible. |
| 190 | +- Ensure that each of these code paths are being sufficiently tested. |
| 191 | + |
| 192 | +#### Detection |
| 193 | + |
| 194 | +- N/A |
| 195 | + |
| 196 | +#### Action Items |
| 197 | + |
| 198 | +- [ ] Kick off conversation with proofs team for how to mainline the interop changes again. |
0 commit comments