-
Notifications
You must be signed in to change notification settings - Fork 177
feat(tests): multi opcode bloatnet ext cases #2186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(tests): multi opcode bloatnet ext cases #2186
Conversation
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]> remove leftover single whitespace :|
Signed-off-by: Guillaume Ballet <[email protected]>
- Add CREATE2 deterministic address calculation to overcome 24KB bytecode limit - Fix While loop condition to properly iterate through contracts - Account for memory expansion costs in gas calculations - Add safety margins (50k gas reserve, 98% utilization) for stability - Tests now scale to any gas limit without bytecode constraints - Achieve 98% gas utilization with 10M and 20M gas limits
- Remove gas reserve and 98% utilization logic for contract calculations - Directly calculate the number of contracts based on available gas - Introduce precise expected gas usage calculations for better accuracy - Ensure tests scale effectively without unnecessary constraints
…mization - Update tests to generate unique bytecode for each contract, maximizing I/O reads during benchmarks. - Clarify comments regarding bytecode generation and its impact on gas costs. - Ensure CREATE2 addresses are calculated consistently using a base bytecode template. - Improve test descriptions to reflect the changes in contract deployment strategy.
…utility function - Remove the custom `calculate_create2_address` function in favor of the `compute_create2_address` utility. - Update tests to utilize the new utility for consistent CREATE2 address calculations. - Simplify code by eliminating unnecessary complexity in address calculation logic. - Ensure that the CREATE2 prefix is directly set to 0xFF in the memory operation for clarity.
…ive selection and bytecode generation - Introduced interactive contract type selection for deploying contracts in the bloatnet benchmark. - Added support for multiple contract types: max_size_24kb, sload_heavy, storage_heavy, and custom. - Refactored bytecode generation functions to improve clarity and maintainability. - Updated README to reflect changes in deployment process and contract types. - Ensured proper handling of factory deployment and transaction receipt checks.
This was commited unintentionally
…OPY pattern - Updated the README to reflect the optimized gas cost for the BALANCE + EXTCODECOPY pattern, reducing it from ~5,007 to ~2,710 gas per contract. - Modified the test_bloatnet_balance_extcodecopy function to read only 1 byte from the end of the bytecode, minimizing gas costs while maximizing contract targeting. - Adjusted calculations for the number of contracts needed based on the new cost per contract, ensuring accurate benchmarks.
Latest commit refactors the factory to rely on an A summary can be seen here:
The gist has been updated: https://gist.github.com/CPerezz/e0b1e26e0bbffdb0350d03793fa51b59 |
@LouisTsai-Csie sorry for the latest changes which were definitely wrong. I've ammended the attack code. And I have modified the structure of the overall helper scripts to rely on one more contract that holds the initcode to reduce complexity. The rest should be fine. |
0c5e09e
to
66826b4
Compare
Remove all changes to pyproject.toml to align with upstream main branch. This ensures CI compatibility and prevents configuration conflicts.
66826b4
to
d7c79f0
Compare
…ode.py Fixed all documentation and comment lines exceeding 79 characters to comply with lint requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CPerezz this version looks good. I’ve left a few suggestions to improve readability and added one question about the verification issue. We need to be especially careful since gas verification is disabled (skip_gas_used_validation=True
). Other than that, I don’t see much issues and i approve in advance. Thanks for the effort!
Implement solution to address reviewer's concern about test validation by using EEST's expected_receipt feature to validate that benchmarks consume all gas. Changes: - Add TransactionReceipt import - Add expected_receipt to both test transactions validating gas_used equals gas_limit - Remove skip_gas_used_validation flag as validation is now explicit This ensures tests can distinguish between: - Early failure from invalid jump (~50K gas) indicating setup issues - Full gas exhaustion (all gas consumed) indicating successful benchmark run The invalid jump remains as a fail-fast mechanism for STATICCALL failures, while expected_receipt validates the benchmark actually executed.
Re-add skip_gas_used_validation=True to both blockchain_test calls as it was accidentally removed. This flag is still needed alongside the expected_receipt validation.
Apply reviewer suggestions to use more readable kwargs syntax for memory and stack operations throughout both test functions. Changes: - Use Op.MLOAD(offset) instead of Op.PUSH1(offset) + Op.MLOAD - Use Op.MSTORE(offset, value) for cleaner memory writes - Use Op.SHA3(offset, length) for hash operations - Use Op.POP(Op.BALANCE) and Op.POP(Op.EXTCODESIZE) for cleaner stack ops - Combine increment operations into single Op.MSTORE(32, Op.ADD(Op.MLOAD(32), 1)) This makes the bytecode generation more concise and easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comment for the failing CI & the gas comparison discussion, and would appreciate if you could run the test locally for verification!
Previously, execute mode was not validating that transactions consumed the expected amount of gas when expected_benchmark_gas_used was set. This could cause benchmark tests to incorrectly pass even when consuming significantly less gas than expected (e.g., due to missing factory contracts). This feature is needed by benchmark tests like the ones in ethereum#2186 in order to make sure that the benchmarks are indeed consuming all gas available or causing a failure otherwise when the flag is set. Changes: - Add expected_benchmark_gas_used and skip_gas_used_validation fields to TransactionPost - Implement gas validation logic in TransactionPost.execute() using transaction receipts - Pass gas validation parameters from StateTest and BlockchainTest to TransactionPost - Add eth_getTransactionReceipt RPC method to fetch gas used from receipts This ensures benchmark tests fail appropriately when gas consumption doesn't match expectations, preventing false positives in performance testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Appreciate the fix.
In my personal view, there’s no need to wait for PR #2219 before merging this. The interface hasn’t changed, so it would automatically take effect when the update is done.
Previously, execute mode was not validating that transactions consumed the expected amount of gas when expected_benchmark_gas_used was set. This could cause benchmark tests to incorrectly pass even when consuming significantly less gas than expected (e.g., due to missing factory contracts). This feature is needed by benchmark tests like the ones in ethereum#2186 in order to make sure that the benchmarks are indeed consuming all gas available or causing a failure otherwise when the flag is set. Changes: - Add expected_benchmark_gas_used and skip_gas_used_validation fields to TransactionPost - Implement gas validation logic in TransactionPost.execute() using transaction receipts - Pass gas validation parameters from StateTest and BlockchainTest to TransactionPost - Add eth_getTransactionReceipt RPC method to fetch gas used from receipts This ensures benchmark tests fail appropriately when gas consumption doesn't match expectations, preventing false positives in performance testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine from my side. Just a small comment but wont block a merge!
…2219) * fix(execute): add gas validation for benchmark tests in execute mode Previously, execute mode was not validating that transactions consumed the expected amount of gas when expected_benchmark_gas_used was set. This could cause benchmark tests to incorrectly pass even when consuming significantly less gas than expected (e.g., due to missing factory contracts). This feature is needed by benchmark tests like the ones in #2186 in order to make sure that the benchmarks are indeed consuming all gas available or causing a failure otherwise when the flag is set. Changes: - Add expected_benchmark_gas_used and skip_gas_used_validation fields to TransactionPost - Implement gas validation logic in TransactionPost.execute() using transaction receipts - Pass gas validation parameters from StateTest and BlockchainTest to TransactionPost - Add eth_getTransactionReceipt RPC method to fetch gas used from receipts This ensures benchmark tests fail appropriately when gas consumption doesn't match expectations, preventing false positives in performance testing. * refactor(execute): simplify gas validation implementation Addresses review comment to make execute mode gas validation cleaner: - Set expected_benchmark_gas_used to gas_benchmark_value as default in execute parametrizer - Remove gas_benchmark_value parameter from TransactionPost, StateTest, BlockchainTest, and BaseTest - Simplify gas validation logic in TransactionPost This ensures consistent gas validation behavior between fill and execute modes with a cleaner implementation that sets defaults at the parametrizer level.
🗒️ Description
This PR is an attempt to cleanup (#2090) and only leave the minimal set of additions to kickstart the inclusion of Bloatnet test cases (which can be further explored here) within EEST benchmarking.
In particular, it adds a first batch of multi-opcode benchmarks which cover (
EXTCODECOPY-BALANCE
) and (EXTCODESIZE-BALANCE
).A summary of the inner workings of the whole solution can be seen here:
The gist has been updated: https://gist.github.com/CPerezz/e0b1e26e0bbffdb0350d03793fa51b59
🔗 Related Issues or PRs
This PR is just a clean restart to the messy history of #2090 . This has been done because the original HEAD contained SSTORE & SLOAD cases which needed to be refactored anyways. Thus, it's just simpler to make another smaller PR to add them back and refactored at once reducing the reivewer burden.
A good history of changes and the reasoning for this PR to look like this comes from #2090 's discussions. So make sure to check there any doubts.
✅ Checklist
tox
checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
type(scope):
.mkdocs serve
locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.@ported_from
marker.Note for the reviewers: Once this PR is approved, there's the need to deploy all the contracts in BloatNet and then, update the PR to contain the address of the factory + the number of contracts deployed.