|
| 1 | +# EOFv0 for packaging legacy code in Verkle Trees |
| 2 | + |
| 3 | +The design draft that proposes the use of EOF |
| 4 | +for storing code in Verkle Trees. |
| 5 | +An alternative to the existing method of executing |
| 6 | +31-byte code chunks accompanied by 1 byte of metadata. |
| 7 | + |
| 8 | +## Goal |
| 9 | + |
| 10 | +1. Provide the result of the jumpdest analysis of a deployed code as the EOF section. |
| 11 | +During code execution the jumpdest analysis is already available |
| 12 | +and the answer to the question "is this jump target valid?" can be looked up |
| 13 | +in the section. This allows using 32-byte Verkle Tree code chunks |
| 14 | +(instead of 31-byte of code + 1 byte of metadata). |
| 15 | +2. EOF-packaged code execution if fully compatible with the legacy code execution. |
| 16 | + |
| 17 | +## Specification Draft |
| 18 | + |
| 19 | +1. Put the code in the single *code* EOF section. |
| 20 | +2. Use the EOF container format proposed by [EIP-3540](https://eips.ethereum.org/EIPS/eip-3540) with |
| 21 | + version 0 and following modifications to "Changes to execution semantics": |
| 22 | + 1. `CODECOPY`/`CODESIZE`/`EXTCODECOPY`/`EXTCODESIZE`/`EXTCODEHASH` operates on the *code* |
| 23 | + section only. |
| 24 | + 2. `JUMP`/`JUMPI`/`PC` relates code positions to the *code* section only. |
| 25 | +3. Perform the jumpdest analysis of the code at deploy time (during contract creation). |
| 26 | +4. Store the result of the jumpdest analysis in the *jumpdest* EOF section as proposed |
| 27 | + by [EIP-3690](https://eips.ethereum.org/EIPS/eip-3690), |
| 28 | + but the jumpdests encoding changed to bitmap. |
| 29 | +5. The packaging process is done for every deployed code during Verkle Tree migration |
| 30 | + and also for every contract creation later |
| 31 | + (i.e. becomes the part of the consensus forever). |
| 32 | + |
| 33 | +## Rationale |
| 34 | + |
| 35 | +### Jumpdests encoding |
| 36 | + |
| 37 | +Originally, the EIP-3690 proposes to use delta encoding for the elements of the *jumpdest* section. |
| 38 | +This should be efficient for an average contract but behaves badly in the worst case |
| 39 | +(every instruction in the code is a `JUMPDEST`). |
| 40 | +The delta encoding has also another disadvantage for Verkle Tree code chunking: |
| 41 | +whole (?) section must be loaded and preprocessed to check a jump target validity. |
| 42 | + |
| 43 | +We propose to use a bitmap to encode jumpdests. |
| 44 | +Such encoding does not need pre-processing and provides random access. |
| 45 | +This gives constant 12.5% size overhead, but does not have the two mentioned disadvantages. |
| 46 | + |
| 47 | +## Extensions |
| 48 | + |
| 49 | +### Data section |
| 50 | + |
| 51 | +Let's try to identify a segment of code at the end of the code where a contract stores data. |
| 52 | +We require a heuristic that does not generate any false positives. |
| 53 | +This arrangement ensures that the instructions inspecting the code |
| 54 | +work without modifications on the continuous *code*+*data* area |
| 55 | + |
| 56 | +Having a *data* section makes the *code* section and therefore the *jumpdest* section smaller. |
| 57 | + |
| 58 | +Example heuristic: |
| 59 | + |
| 60 | +1. Decode instructions. |
| 61 | +2. Traverse instructions in reverse order. |
| 62 | +3. If during traversal a terminating instruction (`STOP`, `INVALID`, etc) |
| 63 | + or the code beginning is encountered, |
| 64 | + then the *data* section starts just after the current position. |
| 65 | + End here. |
| 66 | +4. If during traversal a `JUMPDEST` instruction is encountered, |
| 67 | + then there is no *data* section. |
| 68 | + End here. |
| 69 | + |
| 70 | +### Prove all jump targets are valid |
| 71 | + |
| 72 | +If we can prove that all jump targets in the code are valid, |
| 73 | +then there is no need for the *jumpdest* section. |
| 74 | + |
| 75 | +In the solidity generated code all `JUMPI` instructions are "static" |
| 76 | +(preceded by a `PUSH` instruction). |
| 77 | +Only some `JUMP` instructions are not "static" because they are used to implement |
| 78 | +returns from functions. |
| 79 | + |
| 80 | +Erigon project had an analysis tool which was able to prove all jump validity |
| 81 | +for 90+% of contracts. |
| 82 | + |
| 83 | +### Super-dense metadata encoding (6-bit numbers) |
| 84 | + |
| 85 | +Follow the original Verkle Tree idea to provide the metadata of |
| 86 | +"number of leading pushdata bytes in a chunk". However, instead of including |
| 87 | +this metadata as a single byte in the chunk itself, place the value as a 6-bit |
| 88 | +encoded number in the *metadata* EOF section. This provides the following benefits: |
| 89 | + |
| 90 | +1. The code executes by full 32-byte chunks. |
| 91 | +2. The *metadata* overhead is smaller (2.3% instead of 3.2%). |
| 92 | +3. The *metadata* lookup is only needed for jumps |
| 93 | + (not needed when following through to the next chunk). |
0 commit comments