Skip to content

tests(issue-fix-workflow): eval suites for steps 3, 4, and 5#239

Merged
potiuk merged 3 commits into
apache:mainfrom
justinmclean:issue-fix-workflow
May 20, 2026
Merged

tests(issue-fix-workflow): eval suites for steps 3, 4, and 5#239
potiuk merged 3 commits into
apache:mainfrom
justinmclean:issue-fix-workflow

Conversation

@justinmclean
Copy link
Copy Markdown
Member

8 new eval cases across 3 suites filling the gaps in issue-fix-workflow coverage
(steps 3, 4, and 5 had no fixtures).

Changes

tools/skill-evals/evals/issue-fix-workflow/ — three new suites:

  • step-3-failing-test (3 cases) — assesses a proposed regression test:

    • case-1-test-fails-as-expected — issue key present, adapts from reproducer, run confirms FAILED → accept
    • case-2-missing-issue-key — test omits the issue key reference → reject
    • case-3-test-passes-on-main — test passes before any fix (silent-broken-test trap) → surface-gap
  • step-4-production-change (3 cases) — assesses a proposed production fix:

    • case-1-minimal-fix-proceeds — root cause fixed, diff clean, targeted test green → proceed
    • case-2-symptom-masks-root-cause — symptom guard makes the test pass but root cause unaddressed → iterate
    • case-3-drive-by-in-diff — correct fix but diff includes an unrelated whitespace change → iterate
  • step-5-module-test-run (2 cases) — interprets module test run output:

    • case-1-clean-module-run — all tests pass → proceed
    • case-2-regression-introduced — fix breaks an adjacent round-trip test → iterate

README.md updated: suite table and case count (12 → 20).

@justinmclean justinmclean self-assigned this May 20, 2026
@potiuk
Copy link
Copy Markdown
Member

potiuk commented May 20, 2026

More evals!

@potiuk potiuk merged commit c11f13c into apache:main May 20, 2026
13 checks passed
@andreahlert
Copy link
Copy Markdown
Collaborator

@potiuk

I was still going through this when it got merged 😄, but flagging now because the squash dragged in scope that wasn't in the PR description.

the branch was cut off contribitor-readiness (#227) instead of main, so the squash merge landed the entire contributor-nomination skill into main as part of this PR. 87 files in c11f13c, of which only ~25 are the step-3/4/5 evals this PR actually describes. the other ~60 are the SKILL.md + assess/fetch/render + the full contributor-nomination eval suite + docs/modes.md edit + the project template.

confirmed by diffing the merge against its parent 76dcb977: .claude/skills/contributor-nomination/ didn't exist there, it does now.

side effect: #227 is still open but its content is already on main, never got its own review. happy to draft a follow-up that closes #227 and patches the spec nits I caught in the step-3/4/5 fixtures (verdict rubric ambiguity on step-3, missing edge case on step-5, schema overlap between step-4 in_scope and step-6) if useful.

@potiuk
Copy link
Copy Markdown
Member

potiuk commented May 20, 2026

Damn . Too fast tried to see if I can speed up the merging before rename (and before @justinmclean goes to bed :) ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants