Skip to content

Conversation

CPerezz
Copy link
Contributor

@CPerezz CPerezz commented Sep 22, 2025

🗒️ Description

This PR is an attempt to cleanup (#2090) and only leave the minimal set of additions to kickstart the inclusion of Bloatnet test cases (which can be further explored here) within EEST benchmarking.

In particular, it adds a first batch of multi-opcode benchmarks which cover (EXTCODECOPY-BALANCE) and (EXTCODESIZE-BALANCE).

A summary of the inner workings of the whole solution can be seen here:

  [Initcode Contract]        [Factory Contract]              [24KB Contracts]
        (9.5KB)                    (116B)                     (N x 24KB each)
          │                          │                              │
          │  EXTCODECOPY             │   CREATE2(salt++)            │
          └──────────────►           ├──────────────────►     Contract_0
                                     ├──────────────────►     Contract_1
                                     ├──────────────────►     Contract_2
                                     └──────────────────►     Contract_N

  [Attack Contract] ──STATICCALL──► [Factory.getConfig()]
          │                              returns: (N, hash)
          └─► Loop(i=0 to N):
                1. Generate CREATE2 addr: keccak256(0xFF|factory|i|hash)[12:]
                2. BALANCE(addr)    → 2600 gas (cold access)
                3. EXTCODESIZE(addr) → 100 gas (warm access)

HOW IT WORKS:
  1. Factory uses EXTCODECOPY to load initcode, avoiding PC-relative jump issues
  2. Each CREATE2 deployment produces unique 24KB bytecode (via ADDRESS opcode)
  3. All contracts share same initcode hash for deterministic address calculation
  4. Attack rapidly accesses all contracts, stressing client's state handling

The gist has been updated: https://gist.github.com/CPerezz/e0b1e26e0bbffdb0350d03793fa51b59

🔗 Related Issues or PRs

This PR is just a clean restart to the messy history of #2090 . This has been done because the original HEAD contained SSTORE & SLOAD cases which needed to be refactored anyways. Thus, it's just simpler to make another smaller PR to add them back and refactored at once reducing the reivewer burden.

A good history of changes and the reasoning for this PR to look like this comes from #2090 's discussions. So make sure to check there any doubts.

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Note for the reviewers: Once this PR is approved, there's the need to deploy all the contracts in BloatNet and then, update the PR to contain the address of the factory + the number of contracts deployed.

gballet and others added 30 commits August 14, 2025 13:00
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>

remove leftover single whitespace :|
Signed-off-by: Guillaume Ballet <[email protected]>
- Add CREATE2 deterministic address calculation to overcome 24KB bytecode limit
- Fix While loop condition to properly iterate through contracts
- Account for memory expansion costs in gas calculations
- Add safety margins (50k gas reserve, 98% utilization) for stability
- Tests now scale to any gas limit without bytecode constraints
- Achieve 98% gas utilization with 10M and 20M gas limits
- Remove gas reserve and 98% utilization logic for contract calculations
- Directly calculate the number of contracts based on available gas
- Introduce precise expected gas usage calculations for better accuracy
- Ensure tests scale effectively without unnecessary constraints
…mization

- Update tests to generate unique bytecode for each contract, maximizing I/O reads during benchmarks.
- Clarify comments regarding bytecode generation and its impact on gas costs.
- Ensure CREATE2 addresses are calculated consistently using a base bytecode template.
- Improve test descriptions to reflect the changes in contract deployment strategy.
…utility function

- Remove the custom `calculate_create2_address` function in favor of the `compute_create2_address` utility.
- Update tests to utilize the new utility for consistent CREATE2 address calculations.
- Simplify code by eliminating unnecessary complexity in address calculation logic.
- Ensure that the CREATE2 prefix is directly set to 0xFF in the memory operation for clarity.
…ive selection and bytecode generation

- Introduced interactive contract type selection for deploying contracts in the bloatnet benchmark.
- Added support for multiple contract types: max_size_24kb, sload_heavy, storage_heavy, and custom.
- Refactored bytecode generation functions to improve clarity and maintainability.
- Updated README to reflect changes in deployment process and contract types.
- Ensured proper handling of factory deployment and transaction receipt checks.
This was commited unintentionally
…OPY pattern

- Updated the README to reflect the optimized gas cost for the BALANCE + EXTCODECOPY pattern, reducing it from ~5,007 to ~2,710 gas per contract.
- Modified the test_bloatnet_balance_extcodecopy function to read only 1 byte from the end of the bytecode, minimizing gas costs while maximizing contract targeting.
- Adjusted calculations for the number of contracts needed based on the new cost per contract, ensuring accurate benchmarks.
@CPerezz
Copy link
Contributor Author

CPerezz commented Sep 25, 2025

Latest commit refactors the factory to rely on an initcode contract and also, adds a getConfig method to the factory which the attack code needs to fetch prior to run the attack loop.

A summary can be seen here:

  [Initcode Contract]        [Factory Contract]              [24KB Contracts]
        (9.5KB)                    (116B)                     (N x 24KB each)
          │                          │                              │
          │  EXTCODECOPY             │   CREATE2(salt++)            │
          └──────────────►           ├──────────────────►     Contract_0
                                     ├──────────────────►     Contract_1
                                     ├──────────────────►     Contract_2
                                     └──────────────────►     Contract_N

  [Attack Contract] ──STATICCALL──► [Factory.getConfig()]
          │                              returns: (N, hash)
          └─► Loop(i=0 to N):
                1. Generate CREATE2 addr: keccak256(0xFF|factory|i|hash)[12:]
                2. BALANCE(addr)    → 2600 gas (cold access)
                3. EXTCODESIZE(addr) → 100 gas (warm access)

HOW IT WORKS:
  1. Factory uses EXTCODECOPY to load initcode, avoiding PC-relative jump issues
  2. Each CREATE2 deployment produces unique 24KB bytecode (via ADDRESS opcode)
  3. All contracts share same initcode hash for deterministic address calculation
  4. Attack rapidly accesses all contracts, stressing client's state handling

The gist has been updated: https://gist.github.com/CPerezz/e0b1e26e0bbffdb0350d03793fa51b59

@CPerezz
Copy link
Contributor Author

CPerezz commented Sep 25, 2025

@LouisTsai-Csie sorry for the latest changes which were definitely wrong. I've ammended the attack code. And I have modified the structure of the overall helper scripts to rely on one more contract that holds the initcode to reduce complexity.

The rest should be fine.

@CPerezz CPerezz force-pushed the feat/multi-opcode-bloatnet-EXT-cases branch from 0c5e09e to 66826b4 Compare September 25, 2025 21:26
Remove all changes to pyproject.toml to align with upstream main branch.
This ensures CI compatibility and prevents configuration conflicts.
@CPerezz CPerezz force-pushed the feat/multi-opcode-bloatnet-EXT-cases branch from 66826b4 to d7c79f0 Compare September 25, 2025 21:33
…ode.py

Fixed all documentation and comment lines exceeding 79 characters to comply
with lint requirements.
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CPerezz this version looks good. I’ve left a few suggestions to improve readability and added one question about the verification issue. We need to be especially careful since gas verification is disabled (skip_gas_used_validation=True). Other than that, I don’t see much issues and i approve in advance. Thanks for the effort!

Implement solution to address reviewer's concern about test validation by using
EEST's expected_receipt feature to validate that benchmarks consume all gas.

Changes:
- Add TransactionReceipt import
- Add expected_receipt to both test transactions validating gas_used equals gas_limit
- Remove skip_gas_used_validation flag as validation is now explicit

This ensures tests can distinguish between:
- Early failure from invalid jump (~50K gas) indicating setup issues
- Full gas exhaustion (all gas consumed) indicating successful benchmark run

The invalid jump remains as a fail-fast mechanism for STATICCALL failures,
while expected_receipt validates the benchmark actually executed.
Re-add skip_gas_used_validation=True to both blockchain_test calls
as it was accidentally removed. This flag is still needed alongside
the expected_receipt validation.
Apply reviewer suggestions to use more readable kwargs syntax for memory
and stack operations throughout both test functions.

Changes:
- Use Op.MLOAD(offset) instead of Op.PUSH1(offset) + Op.MLOAD
- Use Op.MSTORE(offset, value) for cleaner memory writes
- Use Op.SHA3(offset, length) for hash operations
- Use Op.POP(Op.BALANCE) and Op.POP(Op.EXTCODESIZE) for cleaner stack ops
- Combine increment operations into single Op.MSTORE(32, Op.ADD(Op.MLOAD(32), 1))

This makes the bytecode generation more concise and easier to understand.
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comment for the failing CI & the gas comparison discussion, and would appreciate if you could run the test locally for verification!

CPerezz added a commit to CPerezz/execution-spec-tests that referenced this pull request Sep 29, 2025
Previously, execute mode was not validating that transactions consumed
the expected amount of gas when expected_benchmark_gas_used was set.
This could cause benchmark tests to incorrectly pass even when consuming
significantly less gas than expected (e.g., due to missing factory contracts).
This feature is needed by benchmark tests like the ones in ethereum#2186 in
order to make sure that the benchmarks are indeed consuming all gas
available or causing a failure otherwise when the flag is set.

Changes:
- Add expected_benchmark_gas_used and skip_gas_used_validation fields to TransactionPost
- Implement gas validation logic in TransactionPost.execute() using transaction receipts
- Pass gas validation parameters from StateTest and BlockchainTest to TransactionPost
- Add eth_getTransactionReceipt RPC method to fetch gas used from receipts

This ensures benchmark tests fail appropriately when gas consumption doesn't match
expectations, preventing false positives in performance testing.
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Appreciate the fix.

In my personal view, there’s no need to wait for PR #2219 before merging this. The interface hasn’t changed, so it would automatically take effect when the update is done.

CPerezz added a commit to CPerezz/execution-spec-tests that referenced this pull request Sep 29, 2025
Previously, execute mode was not validating that transactions consumed
the expected amount of gas when expected_benchmark_gas_used was set.
This could cause benchmark tests to incorrectly pass even when consuming
significantly less gas than expected (e.g., due to missing factory contracts).
This feature is needed by benchmark tests like the ones in ethereum#2186 in
order to make sure that the benchmarks are indeed consuming all gas
available or causing a failure otherwise when the flag is set.

Changes:
- Add expected_benchmark_gas_used and skip_gas_used_validation fields to TransactionPost
- Implement gas validation logic in TransactionPost.execute() using transaction receipts
- Pass gas validation parameters from StateTest and BlockchainTest to TransactionPost
- Add eth_getTransactionReceipt RPC method to fetch gas used from receipts

This ensures benchmark tests fail appropriately when gas consumption doesn't match
expectations, preventing false positives in performance testing.
Copy link
Contributor

@spencer-tb spencer-tb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine from my side. Just a small comment but wont block a merge!

@spencer-tb spencer-tb added scope:tests Scope: Changes EL client test cases in `./tests` type:feat type: Feature feature:benchmark labels Oct 1, 2025
@spencer-tb spencer-tb changed the title Feat/multi opcode bloatnet ext cases feat(tests): multi opcode bloatnet ext cases Oct 1, 2025
@spencer-tb spencer-tb merged commit 675f1a7 into ethereum:main Oct 1, 2025
16 checks passed
spencer-tb pushed a commit that referenced this pull request Oct 1, 2025
…2219)

* fix(execute): add gas validation for benchmark tests in execute mode

Previously, execute mode was not validating that transactions consumed
the expected amount of gas when expected_benchmark_gas_used was set.
This could cause benchmark tests to incorrectly pass even when consuming
significantly less gas than expected (e.g., due to missing factory contracts).
This feature is needed by benchmark tests like the ones in #2186 in
order to make sure that the benchmarks are indeed consuming all gas
available or causing a failure otherwise when the flag is set.

Changes:
- Add expected_benchmark_gas_used and skip_gas_used_validation fields to TransactionPost
- Implement gas validation logic in TransactionPost.execute() using transaction receipts
- Pass gas validation parameters from StateTest and BlockchainTest to TransactionPost
- Add eth_getTransactionReceipt RPC method to fetch gas used from receipts

This ensures benchmark tests fail appropriately when gas consumption doesn't match
expectations, preventing false positives in performance testing.

* refactor(execute): simplify gas validation implementation

Addresses review comment to make execute mode gas validation cleaner:
- Set expected_benchmark_gas_used to gas_benchmark_value as default in execute parametrizer
- Remove gas_benchmark_value parameter from TransactionPost, StateTest, BlockchainTest, and BaseTest
- Simplify gas validation logic in TransactionPost

This ensures consistent gas validation behavior between fill and execute modes
with a cleaner implementation that sets defaults at the parametrizer level.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:benchmark scope:tests Scope: Changes EL client test cases in `./tests` type:feat type: Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants