Skip to content

docs: add PRDBench experiment results (Chorus vs CC Baseline)#77

Open
ChenNima wants to merge 1 commit intodevelopfrom
docs/prdbench-experiment-results
Open

docs: add PRDBench experiment results (Chorus vs CC Baseline)#77
ChenNima wants to merge 1 commit intodevelopfrom
docs/prdbench-experiment-results

Conversation

@ChenNima
Copy link
Copy Markdown
Contributor

@ChenNima ChenNima commented Apr 2, 2026

Summary

  • Add docs/benchmark/EXPERIMENT_RESULTS.md with empirical findings from PRDBench Task 47
  • Update docs/benchmark/README.md with link to experiment results
  • Three models tested (Opus, Sonnet, Haiku) × two setups (CC Baseline, Chorus AI-DLC)

Key findings

  • Opus + Chorus = Opus baseline (69.0%), but 3x slower
  • Sonnet + Chorus = worse than baseline (37-42% vs 50%), premature exit pattern
  • Haiku + Chorus = much worse (14-36% vs 64%), context degradation and MCP tool loss
  • Test-driven acceptance criteria are essential for Chorus to match baseline
  • Chorus has a model capability threshold — below it, the harness hurts

Changed files

File Change
docs/benchmark/EXPERIMENT_RESULTS.md New: full experiment report
docs/benchmark/README.md Add index link to experiment results

Test plan

  • Markdown renders correctly
  • Links in README point to correct file

🤖 Generated with Claude Code

Empirical findings from running PRDBench Task 47 (Library Management System)
across three models (Opus, Sonnet, Haiku) with and without Chorus AI-DLC harness.

Key results:
- Opus: Chorus matches baseline (69.0% = 69.0%), 3x slower
- Sonnet: Chorus hurts (-13%), premature exit pattern
- Haiku: Chorus significantly hurts (-28~50%), context degradation

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 96.43% (🎯 95%) 1896 / 1966
🔵 Statements 95.43% (🎯 95%) 2048 / 2146
🔵 Functions 95% (🎯 93%) 399 / 420
🔵 Branches 87.55% (🎯 85%) 1266 / 1446
File CoverageNo changed files found.
Generated in workflow #91 for commit 500849d by the Vitest Coverage Report Action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant