Skip to content

Improve Cross-Repository Evaluation Workflow Orchestration #1249

@simonrosenberg

Description

@simonrosenberg

Improve Cross-Repository Workflow Orchestration: Replace Dispatch/Poll with Native workflow_call

🎯 Goal

Replace the current inefficient "dispatch + poll" pattern between repositories with GitHub Actions' native workflow_call mechanism to eliminate polling overhead, reduce race conditions, and create explicit dependency chains.

📋 Current Problems

1. Inefficient Polling (80+ minutes overhead)

  • run-eval.yml polls benchmarks build for up to 80 attempts × 60s = 80 minutes
  • Each polling cycle makes 2+ API calls to GitHub API
  • Results in 40+ unnecessary API calls per evaluation run

2. Race Conditions & Reliability Issues

  • Multiple concurrent runs can interfere with each other's polling logic
  • Polling relies on timestamps which can be unreliable
  • No guaranteed way to identify the correct workflow run when multiple exist

3. No Direct Data Flow

  • Built image tags cannot be passed directly from benchmarks to evaluation
  • Evaluation workflow has to infer which images to use
  • No explicit dependency management

4. Complex Error Handling

  • API-based dispatch makes error detection and handling difficult
  • Timeouts are hard to debug
  • No clear failure propagation between workflows

🏗️ Proposed Solution

Convert from "async dispatch + poll" to "sync workflow_call" pattern:

Current: software-agent-sdk →[dispatch]→ benchmarks →[poll]← software-agent-sdk →[dispatch]→ evaluation
Proposed: software-agent-sdk →[workflow_call]→ benchmarks →[outputs]→ software-agent-sdk →[workflow_call]→ evaluation

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions