feat(tests): multi opcode bloatnet ext cases #2186

CPerezz · 2025-09-22T12:52:59Z

🗒️ Description

This PR is an attempt to cleanup (#2090) and only leave the minimal set of additions to kickstart the inclusion of Bloatnet test cases (which can be further explored here) within EEST benchmarking.

In particular, it adds a first batch of multi-opcode benchmarks which cover (EXTCODECOPY-BALANCE) and (EXTCODESIZE-BALANCE).

A summary of the inner workings of the whole solution can be seen here:

  [Initcode Contract]        [Factory Contract]              [24KB Contracts]
        (9.5KB)                    (116B)                     (N x 24KB each)
          │                          │                              │
          │  EXTCODECOPY             │   CREATE2(salt++)            │
          └──────────────►           ├──────────────────►     Contract_0
                                     ├──────────────────►     Contract_1
                                     ├──────────────────►     Contract_2
                                     └──────────────────►     Contract_N

  [Attack Contract] ──STATICCALL──► [Factory.getConfig()]
          │                              returns: (N, hash)
          └─► Loop(i=0 to N):
                1. Generate CREATE2 addr: keccak256(0xFF|factory|i|hash)[12:]
                2. BALANCE(addr)    → 2600 gas (cold access)
                3. EXTCODESIZE(addr) → 100 gas (warm access)

HOW IT WORKS:
  1. Factory uses EXTCODECOPY to load initcode, avoiding PC-relative jump issues
  2. Each CREATE2 deployment produces unique 24KB bytecode (via ADDRESS opcode)
  3. All contracts share same initcode hash for deterministic address calculation
  4. Attack rapidly accesses all contracts, stressing client's state handling

The gist has been updated: https://gist.github.com/CPerezz/e0b1e26e0bbffdb0350d03793fa51b59

🔗 Related Issues or PRs

This PR is just a clean restart to the messy history of #2090 . This has been done because the original HEAD contained SSTORE & SLOAD cases which needed to be refactored anyways. Thus, it's just simpler to make another smaller PR to add them back and refactored at once reducing the reivewer burden.

A good history of changes and the reasoning for this PR to look like this comes from #2090 's discussions. So make sure to check there any doubts.

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Note for the reviewers: Once this PR is approved, there's the need to deploy all the contracts in BloatNet and then, update the PR to contain the address of the factory + the number of contracts deployed.

Signed-off-by: Guillaume Ballet <[email protected]>

Signed-off-by: Guillaume Ballet <[email protected]> remove leftover single whitespace :|

Signed-off-by: Guillaume Ballet <[email protected]>

- Add CREATE2 deterministic address calculation to overcome 24KB bytecode limit - Fix While loop condition to properly iterate through contracts - Account for memory expansion costs in gas calculations - Add safety margins (50k gas reserve, 98% utilization) for stability - Tests now scale to any gas limit without bytecode constraints - Achieve 98% gas utilization with 10M and 20M gas limits

- Remove gas reserve and 98% utilization logic for contract calculations - Directly calculate the number of contracts based on available gas - Introduce precise expected gas usage calculations for better accuracy - Ensure tests scale effectively without unnecessary constraints

…mization - Update tests to generate unique bytecode for each contract, maximizing I/O reads during benchmarks. - Clarify comments regarding bytecode generation and its impact on gas costs. - Ensure CREATE2 addresses are calculated consistently using a base bytecode template. - Improve test descriptions to reflect the changes in contract deployment strategy.

…utility function - Remove the custom `calculate_create2_address` function in favor of the `compute_create2_address` utility. - Update tests to utilize the new utility for consistent CREATE2 address calculations. - Simplify code by eliminating unnecessary complexity in address calculation logic. - Ensure that the CREATE2 prefix is directly set to 0xFF in the memory operation for clarity.

…ive selection and bytecode generation - Introduced interactive contract type selection for deploying contracts in the bloatnet benchmark. - Added support for multiple contract types: max_size_24kb, sload_heavy, storage_heavy, and custom. - Refactored bytecode generation functions to improve clarity and maintainability. - Updated README to reflect changes in deployment process and contract types. - Ensured proper handling of factory deployment and transaction receipt checks.

This was commited unintentionally

…OPY pattern - Updated the README to reflect the optimized gas cost for the BALANCE + EXTCODECOPY pattern, reducing it from ~5,007 to ~2,710 gas per contract. - Modified the test_bloatnet_balance_extcodecopy function to read only 1 byte from the end of the bytecode, minimizing gas costs while maximizing contract targeting. - Adjusted calculations for the number of contracts needed based on the new cost per contract, ensuring accurate benchmarks.

… calculations

CPerezz · 2025-09-25T20:51:48Z

Latest commit refactors the factory to rely on an initcode contract and also, adds a getConfig method to the factory which the attack code needs to fetch prior to run the attack loop.

A summary can be seen here:

  [Initcode Contract]        [Factory Contract]              [24KB Contracts]
        (9.5KB)                    (116B)                     (N x 24KB each)
          │                          │                              │
          │  EXTCODECOPY             │   CREATE2(salt++)            │
          └──────────────►           ├──────────────────►     Contract_0
                                     ├──────────────────►     Contract_1
                                     ├──────────────────►     Contract_2
                                     └──────────────────►     Contract_N

  [Attack Contract] ──STATICCALL──► [Factory.getConfig()]
          │                              returns: (N, hash)
          └─► Loop(i=0 to N):
                1. Generate CREATE2 addr: keccak256(0xFF|factory|i|hash)[12:]
                2. BALANCE(addr)    → 2600 gas (cold access)
                3. EXTCODESIZE(addr) → 100 gas (warm access)

HOW IT WORKS:
  1. Factory uses EXTCODECOPY to load initcode, avoiding PC-relative jump issues
  2. Each CREATE2 deployment produces unique 24KB bytecode (via ADDRESS opcode)
  3. All contracts share same initcode hash for deterministic address calculation
  4. Attack rapidly accesses all contracts, stressing client's state handling

The gist has been updated: https://gist.github.com/CPerezz/e0b1e26e0bbffdb0350d03793fa51b59

CPerezz · 2025-09-25T20:58:01Z

@LouisTsai-Csie sorry for the latest changes which were definitely wrong. I've ammended the attack code. And I have modified the structure of the overall helper scripts to rely on one more contract that holds the initcode to reduce complexity.

The rest should be fine.

Remove all changes to pyproject.toml to align with upstream main branch. This ensures CI compatibility and prevents configuration conflicts.

…ode.py Fixed all documentation and comment lines exceeding 79 characters to comply with lint requirements.

LouisTsai-Csie

@CPerezz this version looks good. I’ve left a few suggestions to improve readability and added one question about the verification issue. We need to be especially careful since gas verification is disabled (skip_gas_used_validation=True). Other than that, I don’t see much issues and i approve in advance. Thanks for the effort!

tests/benchmark/bloatnet/test_multi_opcode.py

Implement solution to address reviewer's concern about test validation by using EEST's expected_receipt feature to validate that benchmarks consume all gas. Changes: - Add TransactionReceipt import - Add expected_receipt to both test transactions validating gas_used equals gas_limit - Remove skip_gas_used_validation flag as validation is now explicit This ensures tests can distinguish between: - Early failure from invalid jump (~50K gas) indicating setup issues - Full gas exhaustion (all gas consumed) indicating successful benchmark run The invalid jump remains as a fail-fast mechanism for STATICCALL failures, while expected_receipt validates the benchmark actually executed.

Re-add skip_gas_used_validation=True to both blockchain_test calls as it was accidentally removed. This flag is still needed alongside the expected_receipt validation.

Apply reviewer suggestions to use more readable kwargs syntax for memory and stack operations throughout both test functions. Changes: - Use Op.MLOAD(offset) instead of Op.PUSH1(offset) + Op.MLOAD - Use Op.MSTORE(offset, value) for cleaner memory writes - Use Op.SHA3(offset, length) for hash operations - Use Op.POP(Op.BALANCE) and Op.POP(Op.EXTCODESIZE) for cleaner stack ops - Combine increment operations into single Op.MSTORE(32, Op.ADD(Op.MLOAD(32), 1)) This makes the bytecode generation more concise and easier to understand.

LouisTsai-Csie

Some comment for the failing CI & the gas comparison discussion, and would appreciate if you could run the test locally for verification!

tests/benchmark/bloatnet/test_multi_opcode.py

…erly

… and fix ADD syntax

Previously, execute mode was not validating that transactions consumed the expected amount of gas when expected_benchmark_gas_used was set. This could cause benchmark tests to incorrectly pass even when consuming significantly less gas than expected (e.g., due to missing factory contracts). This feature is needed by benchmark tests like the ones in ethereum#2186 in order to make sure that the benchmarks are indeed consuming all gas available or causing a failure otherwise when the flag is set. Changes: - Add expected_benchmark_gas_used and skip_gas_used_validation fields to TransactionPost - Implement gas validation logic in TransactionPost.execute() using transaction receipts - Pass gas validation parameters from StateTest and BlockchainTest to TransactionPost - Add eth_getTransactionReceipt RPC method to fetch gas used from receipts This ensures benchmark tests fail appropriately when gas consumption doesn't match expectations, preventing false positives in performance testing.

LouisTsai-Csie

LGTM! Appreciate the fix.

In my personal view, there’s no need to wait for PR #2219 before merging this. The interface hasn’t changed, so it would automatically take effect when the update is done.

Previously, execute mode was not validating that transactions consumed the expected amount of gas when expected_benchmark_gas_used was set. This could cause benchmark tests to incorrectly pass even when consuming significantly less gas than expected (e.g., due to missing factory contracts). This feature is needed by benchmark tests like the ones in ethereum#2186 in order to make sure that the benchmarks are indeed consuming all gas available or causing a failure otherwise when the flag is set. Changes: - Add expected_benchmark_gas_used and skip_gas_used_validation fields to TransactionPost - Implement gas validation logic in TransactionPost.execute() using transaction receipts - Pass gas validation parameters from StateTest and BlockchainTest to TransactionPost - Add eth_getTransactionReceipt RPC method to fetch gas used from receipts This ensures benchmark tests fail appropriately when gas consumption doesn't match expectations, preventing false positives in performance testing.

spencer-tb

Looks fine from my side. Just a small comment but wont block a merge!

docs/CHANGELOG.md

…2219) * fix(execute): add gas validation for benchmark tests in execute mode Previously, execute mode was not validating that transactions consumed the expected amount of gas when expected_benchmark_gas_used was set. This could cause benchmark tests to incorrectly pass even when consuming significantly less gas than expected (e.g., due to missing factory contracts). This feature is needed by benchmark tests like the ones in #2186 in order to make sure that the benchmarks are indeed consuming all gas available or causing a failure otherwise when the flag is set. Changes: - Add expected_benchmark_gas_used and skip_gas_used_validation fields to TransactionPost - Implement gas validation logic in TransactionPost.execute() using transaction receipts - Pass gas validation parameters from StateTest and BlockchainTest to TransactionPost - Add eth_getTransactionReceipt RPC method to fetch gas used from receipts This ensures benchmark tests fail appropriately when gas consumption doesn't match expectations, preventing false positives in performance testing. * refactor(execute): simplify gas validation implementation Addresses review comment to make execute mode gas validation cleaner: - Set expected_benchmark_gas_used to gas_benchmark_value as default in execute parametrizer - Remove gas_benchmark_value parameter from TransactionPost, StateTest, BlockchainTest, and BaseTest - Simplify gas validation logic in TransactionPost This ensures consistent gas validation behavior between fill and execute modes with a cleaner implementation that sets defaults at the parametrizer level.

gballet and others added 30 commits August 14, 2025 13:00

Add BloatNet tests

a1f2153

Signed-off-by: Guillaume Ballet <[email protected]>

try building the contract

02d65b4

Signed-off-by: Guillaume Ballet <[email protected]>

fix: SSTORE 0 -> 1 match all values in the state

e721cc6

Signed-off-by: Guillaume Ballet <[email protected]>

add the tx for 0 -> 1 and 1 -> 2

d1cad25

Signed-off-by: Guillaume Ballet <[email protected]>

fix: linter issues

16f6d30

Signed-off-by: Guillaume Ballet <[email protected]>

remove more whitespaces

374e08a

Signed-off-by: Guillaume Ballet <[email protected]> remove leftover single whitespace :|

fix formatting

333c876

move to benchmarks

79a95b8

Signed-off-by: Guillaume Ballet <[email protected]>

fix linter value

8131e98

use the gas limit from the environment

5f805fd

parameterize the written value in SSTORE

090a400

fix linter issues

cd02a02

update CHANGELOG.md

1f3c381

fix format

f6def7e

simplify syntax

7e20a50

fix: start with an empty contract storage

c24ad35

more fixes, but the result is still incorrect

fc27e53

fix: finally fix the tests

7d87262

linter fix

8556014

add SLOAD tests

326915e

CREATE2 factory approach working

e4583b6

Version with EIP-7997 model working

06f9a63

delete: remove obsolete test_create2.py script

2875cf4

This was commited unintentionally

refactor(benchmark): support non-fixed max_codesize

774c56c

refactor(benchmark): enhance BloatNet test documentation and gas cost…

4dc4876

… calculations

CPerezz force-pushed the feat/multi-opcode-bloatnet-EXT-cases branch from 0c5e09e to 66826b4 Compare September 25, 2025 21:26

revert: restore pyproject.toml to match main branch

d7c79f0

Remove all changes to pyproject.toml to align with upstream main branch. This ensures CI compatibility and prevents configuration conflicts.

CPerezz force-pushed the feat/multi-opcode-bloatnet-EXT-cases branch from 66826b4 to d7c79f0 Compare September 25, 2025 21:33

fix(benchmark): resolve W505 doc line length issues in test_multi_opc…

d3671fa

…ode.py Fixed all documentation and comment lines exceeding 79 characters to comply with lint requirements.

LouisTsai-Csie approved these changes Sep 26, 2025

View reviewed changes

CPerezz added 5 commits September 26, 2025 10:41

refactor(benchmark): simplify STATICCALL usage in BloatNet tests.

8590357

fix(benchmark): restore skip_gas_used_validation flag

ad6b424

Re-add skip_gas_used_validation=True to both blockchain_test calls as it was accidentally removed. This flag is still needed alongside the expected_receipt validation.

fix(benchmark): shorten comment lines to meet doc length limit

bf665c5

LouisTsai-Csie requested changes Sep 26, 2025

View reviewed changes

CPerezz added 2 commits September 27, 2025 15:11

fix(benchmark): correct MSTORE operation to store init_code_hash prop…

6841f09

…erly

fix(benchmark): address review comments - remove redundant validation…

d1b868d

… and fix ADD syntax

CPerezz mentioned this pull request Sep 29, 2025

chore(execute): add gas validation for benchmark tests in execute mode #2219

Merged

8 tasks

LouisTsai-Csie approved these changes Sep 29, 2025

View reviewed changes

danceratopz requested a review from spencer-tb September 30, 2025 14:11

This was referenced Oct 1, 2025

feat(benchmark): Add reversed bloatnet multi-opcode benchmarks with BALANCE-EXTCODE_ variants #2242

Merged

feat(bloatnet): Add first multi-opcode benchmarks for Bloatnet #2090

Closed

spencer-tb approved these changes Oct 1, 2025

View reviewed changes

docs/CHANGELOG.md Show resolved Hide resolved

spencer-tb added scope:tests Scope: Changes EL client test cases in `./tests` type:feat type: Feature feature:benchmark labels Oct 1, 2025

spencer-tb changed the title ~~Feat/multi opcode bloatnet ext cases~~ feat(tests): multi opcode bloatnet ext cases Oct 1, 2025

spencer-tb merged commit 675f1a7 into ethereum:main Oct 1, 2025
16 checks passed

feat(tests): multi opcode bloatnet ext cases #2186

feat(tests): multi opcode bloatnet ext cases #2186

Uh oh!

Conversation

CPerezz commented Sep 22, 2025 • edited by spencer-tb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Uh oh!

CPerezz commented Sep 25, 2025

Uh oh!

CPerezz commented Sep 25, 2025

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

spencer-tb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CPerezz commented Sep 22, 2025 •

edited by spencer-tb

Loading