Breaking changes in software-agent-SDK w.r.t. benchmarks disables latest benchmarks to eval old software-agent-sdk
The Problem described here with the introduction of SDK Critic Module
- The SDK Critic Module was added recently: Commit 79868ae5 (Port over Critic system from benchmark project #1171) added the openhands/sdk/critic module on Nov 17, 2025
- The workflow used an old SDK commit: The failed workflow run tried to use SDK commit 61b8b574a3de5a461cad32dc3d0a21a75f888e90, which predates the critic module
- The benchmarks code requires the critic module: The file benchmarks/utils/models.py is trying to import:
from openhands.sdk.critic import CriticBase
Why This Happened
The benchmarks repo's workflow accepts an sdk-commit input parameter. When triggered manually, someone specified commit 61b8b574a3de5a461cad32dc3d0a21a75f888e90, which is before the critic module was added.
The Solution
To fix this, you need to update the SDK submodule reference to a commit that includes the critic module. The current benchmarks main branch points to SDK commit e485bba962171d5fefdfa757f1d7dd245da598cd, which
does include the critic module.
Options:
- Manually trigger the workflow with a newer SDK commit (at least 79868ae5 or later, or use e485bba which is current)
- Update the benchmarks repo to always use a newer SDK commit that has the critic module
- Check the evaluation workflow that triggered this - it may need to pass a newer SDK commit