Add SDK compatibility documentation and benchmarks-commit parameter #119
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses issue #118 by adding backward compatibility support for evaluating older SDK versions that don't include the critic module.
Changes
1. Documentation (README.md)
79868ae5(Nov 17, 2025)benchmarks-commitworkflow parameter (recommended)2. Workflow Enhancement (.github/workflows/build-swe-bench-images.yml)
benchmarks-commitinput parameter to the workflowbenchmarks-commitparameter when providedbenchmarks-commitis not specified, the workflow behaves exactly as beforeProblem Statement
The SDK introduced the
openhands.sdk.criticmodule in commit79868ae5(Nov 17, 2025). The benchmarks repository importsCriticBasefrom this module, which means:79868ae5and later79868ae5)This prevented users from evaluating historical SDK performance or debugging regressions with older SDK commits.
Solution
The
benchmarks-commitparameter allows users to:sdk-commitparameterbenchmarks-commitparameterExample Usage
To evaluate SDK commit
61b8b574a3de5a461cad32dc3d0a21a75f888e90(which predates the critic module):build-swe-bench-imagesworkflowsdk-committo61b8b574a3de5a461cad32dc3d0a21a75f888e90benchmarks-committo a commit before the critic import was addedTesting
Fixes
Closes #118
Checklist
@simonrosenberg can click here to continue refining the PR