Skip to content

feat(worker): add verify stage and harness closed-loop pipeline#48

Open
sawzhang wants to merge 1 commit into
masterfrom
feat/harness-verify-loop
Open

feat(worker): add verify stage and harness closed-loop pipeline#48
sawzhang wants to merge 1 commit into
masterfrom
feat/harness-verify-loop

Conversation

@sawzhang

Copy link
Copy Markdown
Owner

Summary

  • Add new verify agent role (execute + read tools, max 3 turns) for running build/lint/type-check commands
  • Implement failure redirect context passing: when verify/test fails and redirects to code stage, the error details are injected into the coding agent's prompt so it can self-correct
  • Add composite evaluator (evaluator.py) combining LLM confidence with external signals (test pass rate, build success, lint clean) via configurable weights
  • Add project-level verify_commands field with auto-detection fallback from tech_stack
  • Add built-in "harness_pipeline" template: Parse→Spec→Code→Verify→Test→Review→Signoff with automatic retry loops (max 3 rounds)
  • Add _template_needs_graph() to auto-enable graph execution when template uses depends_on/on_failure

Test plan

  • test_evaluator.py — composite scoring, weight loading, evaluate function
  • test_harness_loop.py — failure redirect context flow, template detection, verify command resolution
  • test_verify_stage.py — verify role prompt/tools config, stage instructions, guardrails
  • test_worker_graph.py — graph execution with redirect context

🤖 Generated with Claude Code

Add a new "verify" agent role for build/lint/type-check validation, with
automatic failure redirect context passing (verify/test → code) so the
coding agent can self-correct. Includes composite evaluator scoring from
LLM confidence + external signals, project-level verify_commands config,
tech_stack auto-detection, and a built-in "harness_pipeline" template
(Code→Verify→Test with up to 3 retry loops).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  platform/app
  config.py
  platform/app/models
  project.py
  platform/app/schemas
  project.py
  template.py
  platform/app/worker
  agents.py
  engine.py 1191-1193, 1453, 1456, 1459-1460, 1483-1484, 1490-1493, 1778-1780
  evaluator.py 36-38, 92
  executor.py
  prompts.py
Project Total  

This report was generated by python-coverage-comment-action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant