Skip to content

bug: build-swe-bench-images workflow always uses main benchmarks which cannot build with old software-agent-sdk commits #118

@simonrosenberg

Description

@simonrosenberg

Breaking changes in software-agent-SDK w.r.t. benchmarks disables latest benchmarks to eval old software-agent-sdk

The Problem described here with the introduction of SDK Critic Module

  1. The SDK Critic Module was added recently: Commit 79868ae5 (Port over Critic system from benchmark project #1171) added the openhands/sdk/critic module on Nov 17, 2025
  2. The workflow used an old SDK commit: The failed workflow run tried to use SDK commit 61b8b574a3de5a461cad32dc3d0a21a75f888e90, which predates the critic module
  3. The benchmarks code requires the critic module: The file benchmarks/utils/models.py is trying to import:
    from openhands.sdk.critic import CriticBase
    Why This Happened
    The benchmarks repo's workflow accepts an sdk-commit input parameter. When triggered manually, someone specified commit 61b8b574a3de5a461cad32dc3d0a21a75f888e90, which is before the critic module was added.
    The Solution
    To fix this, you need to update the SDK submodule reference to a commit that includes the critic module. The current benchmarks main branch points to SDK commit e485bba962171d5fefdfa757f1d7dd245da598cd, which
    does include the critic module.
    Options:
  4. Manually trigger the workflow with a newer SDK commit (at least 79868ae5 or later, or use e485bba which is current)
  5. Update the benchmarks repo to always use a newer SDK commit that has the critic module
  6. Check the evaluation workflow that triggered this - it may need to pass a newer SDK commit

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions