|
| 1 | +# EOFv0 for packaging legacy code in Verkle Trees |
| 2 | + |
| 3 | +The design draft that proposes the use of EOF |
| 4 | +for storing code in Verkle Trees. |
| 5 | +An alternative to the existing method of executing |
| 6 | +31-byte code chunks accompanied by 1 byte of metadata. |
| 7 | + |
| 8 | +## Goal |
| 9 | + |
| 10 | +Simplified legacy code execution in the Verkle Tree implementation. |
| 11 | + |
| 12 | +Better "code-to-data" ratio. |
| 13 | + |
| 14 | +Provide the result of the jumpdest analysis of a deployed code as the EOF section. |
| 15 | +During code execution the jumpdest analysis is already available |
| 16 | +and the answer to the question "is this jump target valid?" can be looked up |
| 17 | +in the section. This allows using 32-byte Verkle Tree code chunks |
| 18 | +(instead of 31-byte of code + 1 byte of metadata). |
| 19 | + |
| 20 | +## Specification |
| 21 | + |
| 22 | +### Container |
| 23 | + |
| 24 | +1. Re-use the EOF container format defined by [EIP-3540](https://eips.ethereum.org/EIPS/eip-3540). |
| 25 | +2. Set the EOF version to 0. I.e. the packaged legacy code will be referenced as EOFv0. |
| 26 | +3. The EOFv0 consists of the header and two sections: |
| 27 | + - *jumpdest* |
| 28 | + - *code* |
| 29 | +4. The header must contain information about the sizes of these sections. |
| 30 | + For that the EIP-3540 header or a simplified one can be used. |
| 31 | +5. The legacy code is placed in the *code* section without modifications. |
| 32 | +6. The *jumpdest* section contains the set of all valid jump destinations matching the positions |
| 33 | + of all `JUMPDEST` instructions in the *code*. |
| 34 | + The exact encoding of this section is specified separately. |
| 35 | + |
| 36 | +### Changes to execution semantics |
| 37 | + |
| 38 | +1. Execution starts at the first byte of the *code* section, and `PC` is set to 0. |
| 39 | +2. Execution stops if `PC` goes outside the code section bounds (in case of EOFv0 this is also the |
| 40 | + end of the container). |
| 41 | +3. `PC` returns the current position within the *code*. |
| 42 | +4. The instructions which *read* code must refer to the *code* section only. |
| 43 | + The modification keeps the behavior of these instructions unchanged. |
| 44 | + These instructions are invalid in EOFv1. |
| 45 | + The instructions are: |
| 46 | + - `CODECOPY` (copies a part of the *code* section), |
| 47 | + - `CODESIZE` (returns the size of the *code* section), |
| 48 | + - `EXTCODECOPY`, |
| 49 | + - `EXTCODESIZE`, |
| 50 | + - `EXTCODEHASH`. |
| 51 | +5. To execute a `JUMP` or `JUMPI` instruction the jump target position must exist |
| 52 | + in the *jumpdest* set. The *jumpdest* guarantees that the target instruction is `JUMPDEST`. |
| 53 | + |
| 54 | +### Changes to contract creation semantics |
| 55 | + |
| 56 | +1. Initcode execution is performed without changes. I.e. initcode remains an ephemeral code |
| 57 | + without EOF wrapping. However, because the EOF containers are not visible to any EVM program, |
| 58 | + implementations may decide to wrap initcodes with EOFv0 and execute it the same way as |
| 59 | + EOFv0 deployed codes. |
| 60 | +2. The initcode size limit and cost remains defined by [EIP-3860](https://eips.ethereum.org/EIPS/eip-3860). |
| 61 | +3. The initcode still returns a plain deploy code. |
| 62 | + The plain code size limit and cost is defined by [EIP-170](https://eips.ethereum.org/EIPS/eip-170). |
| 63 | +4. If the plain code is not empty, it must be wrapped with EOFv0 before put in the state: |
| 64 | + - perform jumpdest analysis of the plain code, |
| 65 | + - encode the jumpdest analysis result as the *jumpdest* section, |
| 66 | + - put the plain code in the *code* section, |
| 67 | + - create EOFv0 container with the *jumpdest* and *code* sections. |
| 68 | +5. The code deployment cost is calculated from the total EOFv0 size. |
| 69 | + This is a breaking change so the impact must be analysed. |
| 70 | +6. During Verkle Tree migration perform the above EOFv0 wrapping of all deployed code. |
| 71 | + |
| 72 | +### Jumpdest section encoding |
| 73 | + |
| 74 | +#### Bitmap |
| 75 | + |
| 76 | +A valid `JUMPDEST` is represented as `1` in a byte-aligned bitset. |
| 77 | +The tailing zero bytes must be trimmed. |
| 78 | +Therefore, the size of the bitmap is at most `ceil(len(code) / 8)` giving ~12% size overhead |
| 79 | +(comparing with plain code size). |
| 80 | +Such encoding doesn't require pre-processing and provides random access. |
| 81 | + |
| 82 | +Originally, the EIP-3690 proposes to use delta encoding for the elements of the *jumpdest* section. |
| 83 | +This should be efficient for an average contract but behaves badly in the worst case |
| 84 | +(every instruction in the code is a `JUMPDEST`). |
| 85 | +The delta encoding has also another disadvantage for Verkle Tree code chunking: |
| 86 | +whole (?) section must be loaded and preprocessed to check a single jump target validity. |
| 87 | + |
| 88 | +### Metadata encoding (8-bit numbers) |
| 89 | + |
| 90 | +Follow the original Verkle Tree idea to provide the single byte of metadata with the |
| 91 | +"number of leading pushdata bytes in a chunk". |
| 92 | +However, instead of including this in the chunk itself, |
| 93 | +place the byte in order in the *jumpdest* section. |
| 94 | + |
| 95 | +This provides the following benefits over the original Verkle Tree design: |
| 96 | + |
| 97 | +1. The code executes by full 32-byte chunks. |
| 98 | +2. The *metadata* size overhead slightly smaller: 3.1% (`1/32`) instead of 3.2% (`1/31`). |
| 99 | +3. The *metadata* lookup is only needed for executing jumps |
| 100 | + (not needed when following through to the next chunk). |
| 101 | + |
| 102 | +### Super-dense metadata encoding (6-bit numbers) |
| 103 | + |
| 104 | +The same as above except encode the values as 6-bit numbers |
| 105 | +(minimum number of bits needed for encoding `32`). |
| 106 | +Such encoding lowers the size overhead from 3.1% to 2.3%. |
| 107 | + |
| 108 | +## Backwards Compatibility |
| 109 | + |
| 110 | +EOF-packaged code execution if fully compatible with the legacy code execution. |
| 111 | +This is achieved by prepending the legacy code with EOF header and the section containing |
| 112 | +jumpdest metadata. The contents of the code section is identical to the lagacy code. |
| 113 | + |
| 114 | +Moreover, the wrapping process is bidirectional: wrapping can be created from the legacy code |
| 115 | +and legacy code extracted from the wrapping without any information loss. |
| 116 | +Implementations may consider keeping the legacy code in the database without modifications |
| 117 | +and only construct the EOF wrapping when loading the code from the database. |
| 118 | + |
| 119 | +It also can be noted that information in the *jumpdest* section is redundant to the `JUMPDEST` |
| 120 | +instructions. However, we **cannot** remove these instructions from the code because |
| 121 | +this potentially breaks: |
| 122 | + |
| 123 | +- *dynamic* jumps (where we will not be able to adjust their jump targets), |
| 124 | +- code introspection with `CODECOPY` and `EXTCODECOPY`. |
| 125 | + |
| 126 | +## Extensions |
| 127 | + |
| 128 | +### Detect unreachable code |
| 129 | + |
| 130 | +The bitmap encoding has a potential of omitting contract's tailing data from the *jumpdest* section |
| 131 | +provided there are no `0x5b` bytes in the data. |
| 132 | + |
| 133 | +We can extend this capability by trying to detect unreachable code |
| 134 | +(e.g. contract's metadata, data or inicodes and deploy codes for `CREATE` instructions). |
| 135 | +For this we require a heuristic that does not generate any false positives. |
| 136 | + |
| 137 | +One interesting example is a "data" contract staring with a terminating instruction |
| 138 | +(e.g. `STOP`, `INVALID` or any unassigned opcode). |
| 139 | + |
| 140 | +There are new risks this method introduces. |
| 141 | + |
| 142 | +1. Treating unassigned opcodes as terminating instructions prevents them |
| 143 | + from being assigned to a new instruction. |
| 144 | +2. The heuristic will be considered by compilers optimizing for code size. |
| 145 | + |
| 146 | +### Prove jump targets are valid |
| 147 | + |
| 148 | +#### Prove all "static jumps" |
| 149 | + |
| 150 | +By "static jump" we consider a jump instruction directly preceded by a `PUSH` instruction. |
| 151 | + |
| 152 | +In the solidity generated code all `JUMPI` instructions and 85% of `JUMP` instructions are "static". |
| 153 | +(these numbers must be verified on bigger sample of contracts). |
| 154 | + |
| 155 | +We can easily validate all static jumps and mark a contracts with "all static jumps valid" |
| 156 | +at deploy time. Then at runtime static jumps can be executed without accessing jumpdest section. |
| 157 | + |
| 158 | +#### Prove all jumps |
| 159 | + |
| 160 | +If we can prove that all jump targets in the code are valid, |
| 161 | +then there is no need for the *jumpdest* section. |
| 162 | + |
| 163 | +Erigon project has a |
| 164 | +[prototype analysis tool](https://github.com/ledgerwatch/erigon/blob/devel/cmd/hack/flow/flow.go#L488) |
| 165 | +which is able to prove all jump validity for 95+% of contracts. |
| 166 | + |
0 commit comments