bug: build-swe-bench-images workflow always uses main benchmarks which cannot build with old software-agent-sdk commits

### Breaking changes in software-agent-SDK w.r.t. benchmarks disables latest benchmarks to eval old software-agent-sdk 


The Problem described here with the introduction of SDK Critic Module
  1. The SDK Critic Module was added recently: Commit 79868ae5 (Port over Critic system from benchmark project #1171) added the openhands/sdk/critic module on Nov 17, 2025
  2. The workflow used an old SDK commit: The failed workflow run tried to use SDK commit 61b8b574a3de5a461cad32dc3d0a21a75f888e90, which predates the critic module
  3. The benchmarks code requires the critic module: The file benchmarks/utils/models.py is trying to import:
  from openhands.sdk.critic import CriticBase
  Why This Happened
  The benchmarks repo's workflow accepts an sdk-commit input parameter. When triggered manually, someone specified commit 61b8b574a3de5a461cad32dc3d0a21a75f888e90, which is before the critic module was added.
  The Solution
  To fix this, you need to update the SDK submodule reference to a commit that includes the critic module. The current benchmarks main branch points to SDK commit e485bba962171d5fefdfa757f1d7dd245da598cd, which
  does include the critic module.
  Options:
  1. Manually trigger the workflow with a newer SDK commit (at least 79868ae5 or later, or use e485bba which is current)
  2. Update the benchmarks repo to always use a newer SDK commit that has the critic module
  3. Check the evaluation workflow that triggered this - it may need to pass a newer SDK commit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: build-swe-bench-images workflow always uses main benchmarks which cannot build with old software-agent-sdk commits #118

Breaking changes in software-agent-SDK w.r.t. benchmarks disables latest benchmarks to eval old software-agent-sdk

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: build-swe-bench-images workflow always uses main benchmarks which cannot build with old software-agent-sdk commits #118

Description

Breaking changes in software-agent-SDK w.r.t. benchmarks disables latest benchmarks to eval old software-agent-sdk

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions