Skip to content

Commit fd4fc38

Browse files
authored
EOFv0 for packaging legacy code in Verkle Trees (#58)
The design draft that proposes the use of EOF for storing code in Verkle Trees. An alternative to the existing method of executing 31-byte code chunks accompanied by 1 byte of metadata.
1 parent 40c4645 commit fd4fc38

File tree

1 file changed

+166
-0
lines changed

1 file changed

+166
-0
lines changed

spec/eofv0_verkle.md

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# EOFv0 for packaging legacy code in Verkle Trees
2+
3+
The design draft that proposes the use of EOF
4+
for storing code in Verkle Trees.
5+
An alternative to the existing method of executing
6+
31-byte code chunks accompanied by 1 byte of metadata.
7+
8+
## Goal
9+
10+
Simplified legacy code execution in the Verkle Tree implementation.
11+
12+
Better "code-to-data" ratio.
13+
14+
Provide the result of the jumpdest analysis of a deployed code as the EOF section.
15+
During code execution the jumpdest analysis is already available
16+
and the answer to the question "is this jump target valid?" can be looked up
17+
in the section. This allows using 32-byte Verkle Tree code chunks
18+
(instead of 31-byte of code + 1 byte of metadata).
19+
20+
## Specification
21+
22+
### Container
23+
24+
1. Re-use the EOF container format defined by [EIP-3540](https://eips.ethereum.org/EIPS/eip-3540).
25+
2. Set the EOF version to 0. I.e. the packaged legacy code will be referenced as EOFv0.
26+
3. The EOFv0 consists of the header and two sections:
27+
- *jumpdest*
28+
- *code*
29+
4. The header must contain information about the sizes of these sections.
30+
For that the EIP-3540 header or a simplified one can be used.
31+
5. The legacy code is placed in the *code* section without modifications.
32+
6. The *jumpdest* section contains the set of all valid jump destinations matching the positions
33+
of all `JUMPDEST` instructions in the *code*.
34+
The exact encoding of this section is specified separately.
35+
36+
### Changes to execution semantics
37+
38+
1. Execution starts at the first byte of the *code* section, and `PC` is set to 0.
39+
2. Execution stops if `PC` goes outside the code section bounds (in case of EOFv0 this is also the
40+
end of the container).
41+
3. `PC` returns the current position within the *code*.
42+
4. The instructions which *read* code must refer to the *code* section only.
43+
The modification keeps the behavior of these instructions unchanged.
44+
These instructions are invalid in EOFv1.
45+
The instructions are:
46+
- `CODECOPY` (copies a part of the *code* section),
47+
- `CODESIZE` (returns the size of the *code* section),
48+
- `EXTCODECOPY`,
49+
- `EXTCODESIZE`,
50+
- `EXTCODEHASH`.
51+
5. To execute a `JUMP` or `JUMPI` instruction the jump target position must exist
52+
in the *jumpdest* set. The *jumpdest* guarantees that the target instruction is `JUMPDEST`.
53+
54+
### Changes to contract creation semantics
55+
56+
1. Initcode execution is performed without changes. I.e. initcode remains an ephemeral code
57+
without EOF wrapping. However, because the EOF containers are not visible to any EVM program,
58+
implementations may decide to wrap initcodes with EOFv0 and execute it the same way as
59+
EOFv0 deployed codes.
60+
2. The initcode size limit and cost remains defined by [EIP-3860](https://eips.ethereum.org/EIPS/eip-3860).
61+
3. The initcode still returns a plain deploy code.
62+
The plain code size limit and cost is defined by [EIP-170](https://eips.ethereum.org/EIPS/eip-170).
63+
4. If the plain code is not empty, it must be wrapped with EOFv0 before put in the state:
64+
- perform jumpdest analysis of the plain code,
65+
- encode the jumpdest analysis result as the *jumpdest* section,
66+
- put the plain code in the *code* section,
67+
- create EOFv0 container with the *jumpdest* and *code* sections.
68+
5. The code deployment cost is calculated from the total EOFv0 size.
69+
This is a breaking change so the impact must be analysed.
70+
6. During Verkle Tree migration perform the above EOFv0 wrapping of all deployed code.
71+
72+
### Jumpdest section encoding
73+
74+
#### Bitmap
75+
76+
A valid `JUMPDEST` is represented as `1` in a byte-aligned bitset.
77+
The tailing zero bytes must be trimmed.
78+
Therefore, the size of the bitmap is at most `ceil(len(code) / 8)` giving ~12% size overhead
79+
(comparing with plain code size).
80+
Such encoding doesn't require pre-processing and provides random access.
81+
82+
Originally, the EIP-3690 proposes to use delta encoding for the elements of the *jumpdest* section.
83+
This should be efficient for an average contract but behaves badly in the worst case
84+
(every instruction in the code is a `JUMPDEST`).
85+
The delta encoding has also another disadvantage for Verkle Tree code chunking:
86+
whole (?) section must be loaded and preprocessed to check a single jump target validity.
87+
88+
### Metadata encoding (8-bit numbers)
89+
90+
Follow the original Verkle Tree idea to provide the single byte of metadata with the
91+
"number of leading pushdata bytes in a chunk".
92+
However, instead of including this in the chunk itself,
93+
place the byte in order in the *jumpdest* section.
94+
95+
This provides the following benefits over the original Verkle Tree design:
96+
97+
1. The code executes by full 32-byte chunks.
98+
2. The *metadata* size overhead slightly smaller: 3.1% (`1/32`) instead of 3.2% (`1/31`).
99+
3. The *metadata* lookup is only needed for executing jumps
100+
(not needed when following through to the next chunk).
101+
102+
### Super-dense metadata encoding (6-bit numbers)
103+
104+
The same as above except encode the values as 6-bit numbers
105+
(minimum number of bits needed for encoding `32`).
106+
Such encoding lowers the size overhead from 3.1% to 2.3%.
107+
108+
## Backwards Compatibility
109+
110+
EOF-packaged code execution if fully compatible with the legacy code execution.
111+
This is achieved by prepending the legacy code with EOF header and the section containing
112+
jumpdest metadata. The contents of the code section is identical to the lagacy code.
113+
114+
Moreover, the wrapping process is bidirectional: wrapping can be created from the legacy code
115+
and legacy code extracted from the wrapping without any information loss.
116+
Implementations may consider keeping the legacy code in the database without modifications
117+
and only construct the EOF wrapping when loading the code from the database.
118+
119+
It also can be noted that information in the *jumpdest* section is redundant to the `JUMPDEST`
120+
instructions. However, we **cannot** remove these instructions from the code because
121+
this potentially breaks:
122+
123+
- *dynamic* jumps (where we will not be able to adjust their jump targets),
124+
- code introspection with `CODECOPY` and `EXTCODECOPY`.
125+
126+
## Extensions
127+
128+
### Detect unreachable code
129+
130+
The bitmap encoding has a potential of omitting contract's tailing data from the *jumpdest* section
131+
provided there are no `0x5b` bytes in the data.
132+
133+
We can extend this capability by trying to detect unreachable code
134+
(e.g. contract's metadata, data or inicodes and deploy codes for `CREATE` instructions).
135+
For this we require a heuristic that does not generate any false positives.
136+
137+
One interesting example is a "data" contract staring with a terminating instruction
138+
(e.g. `STOP`, `INVALID` or any unassigned opcode).
139+
140+
There are new risks this method introduces.
141+
142+
1. Treating unassigned opcodes as terminating instructions prevents them
143+
from being assigned to a new instruction.
144+
2. The heuristic will be considered by compilers optimizing for code size.
145+
146+
### Prove jump targets are valid
147+
148+
#### Prove all "static jumps"
149+
150+
By "static jump" we consider a jump instruction directly preceded by a `PUSH` instruction.
151+
152+
In the solidity generated code all `JUMPI` instructions and 85% of `JUMP` instructions are "static".
153+
(these numbers must be verified on bigger sample of contracts).
154+
155+
We can easily validate all static jumps and mark a contracts with "all static jumps valid"
156+
at deploy time. Then at runtime static jumps can be executed without accessing jumpdest section.
157+
158+
#### Prove all jumps
159+
160+
If we can prove that all jump targets in the code are valid,
161+
then there is no need for the *jumpdest* section.
162+
163+
Erigon project has a
164+
[prototype analysis tool](https://github.com/ledgerwatch/erigon/blob/devel/cmd/hack/flow/flow.go#L488)
165+
which is able to prove all jump validity for 95+% of contracts.
166+

0 commit comments

Comments
 (0)