Skip to content

Conversation

@simonrosenberg
Copy link
Collaborator

Summary

This PR addresses issue #118 by adding backward compatibility support for evaluating older SDK versions that don't include the critic module.

Changes

1. Documentation (README.md)

  • Added a new "SDK Compatibility and Version Management" section
  • Documented the SDK critic module breaking change introduced in commit 79868ae5 (Nov 17, 2025)
  • Provided two methods for evaluating older SDK versions:
    • Using the new benchmarks-commit workflow parameter (recommended)
    • Manually checking out compatible versions locally
  • Included guidance on finding compatible SDK/benchmarks version combinations

2. Workflow Enhancement (.github/workflows/build-swe-bench-images.yml)

  • Added optional benchmarks-commit input parameter to the workflow
  • Added a new step to determine which benchmarks commit to check out
  • Modified the checkout action to use the benchmarks-commit parameter when provided
  • Backward compatible: When benchmarks-commit is not specified, the workflow behaves exactly as before

Problem Statement

The SDK introduced the openhands.sdk.critic module in commit 79868ae5 (Nov 17, 2025). The benchmarks repository imports CriticBase from this module, which means:

  • ✅ Current benchmarks code works with SDK 79868ae5 and later
  • ❌ Current benchmarks code fails with older SDK versions (before 79868ae5)

This prevented users from evaluating historical SDK performance or debugging regressions with older SDK commits.

Solution

The benchmarks-commit parameter allows users to:

  1. Specify an older SDK commit via the existing sdk-commit parameter
  2. Specify an older benchmarks commit via the new benchmarks-commit parameter
  3. Both repositories are checked out to compatible versions before the build starts

Example Usage

To evaluate SDK commit 61b8b574a3de5a461cad32dc3d0a21a75f888e90 (which predates the critic module):

  1. Manually trigger the build-swe-bench-images workflow
  2. Set sdk-commit to 61b8b574a3de5a461cad32dc3d0a21a75f888e90
  3. Set benchmarks-commit to a commit before the critic import was added
  4. The workflow will check out both repositories at compatible versions

Testing

  • ✅ Changes follow existing workflow patterns
  • ✅ Pre-commit hooks passed
  • ✅ Backward compatible - no changes when parameters are not provided
  • ✅ Documentation includes examples and troubleshooting guidance

Fixes

Closes #118

Checklist

  • Added comprehensive documentation in README
  • Implemented non-breaking workflow changes
  • Followed repository coding standards
  • Tested workflow syntax
  • Provided usage examples in documentation

@simonrosenberg can click here to continue refining the PR

- Document SDK critic module breaking change (commit 79868ae5) in README
- Add optional benchmarks-commit parameter to build-swe-bench-images workflow
- Update checkout step to support evaluating older SDK versions with compatible benchmarks code
- Maintain backward compatibility - workflow behaves the same when parameter is not provided

Fixes #118

Co-authored-by: openhands <[email protected]>
- Make the documentation more general about benchmarks/SDK version dependencies
- Present SDK critic module as an example rather than the main focus
- Clarify that version incompatibilities can arise as both codebases evolve

Co-authored-by: openhands <[email protected]>
- Explain that empty ref causes actions/checkout to use the triggering commit
- This preserves the original workflow behavior for workflow_dispatch events

Co-authored-by: openhands <[email protected]>
Copy link
Collaborator

@juanmichelini juanmichelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@simonrosenberg simonrosenberg merged commit 9aabf7f into main Dec 2, 2025
2 checks passed
@simonrosenberg simonrosenberg deleted the openhands/fix-sdk-compatibility-issue-118 branch December 2, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: build-swe-bench-images workflow always uses main benchmarks which cannot build with old software-agent-sdk commits

4 participants