Skip to content

feat(commands): improve learn-eval with checklist-based holistic verdict#360

Open
shimo4228 wants to merge 2 commits intoaffaan-m:mainfrom
shimo4228:feat/commands/learn-eval-v2
Open

feat(commands): improve learn-eval with checklist-based holistic verdict#360
shimo4228 wants to merge 2 commits intoaffaan-m:mainfrom
shimo4228:feat/commands/learn-eval-v2

Conversation

@shimo4228
Copy link
Contributor

@shimo4228 shimo4228 commented Mar 8, 2026

Summary

Improves the /learn-eval command by replacing the 5-dimension numeric scoring rubric with a checklist-based holistic verdict system.

Motivation

The original learn-eval used a scoring table (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage — each scored 1–5). In practice with modern frontier models (Opus 4.6+), this approach has a key limitation: forcing rich qualitative signals into numeric scores loses nuance and can produce misleading totals. For example, a pattern might score 3/5 on Coverage but 5/5 on everything else — the total (23/25) looks great, but the Coverage gap might be critical.

Modern models have strong contextual judgment. Rather than quantizing that judgment into numbers, this update lets the model weigh all factors holistically while an explicit checklist ensures no critical verification step is skipped.

Key Changes

1. Explicit pre-save checklist (Step 5a)

Before any verdict, the model must actually verify:

  • Grep ~/.claude/skills/ for content overlap
  • Check MEMORY.md for overlap
  • Consider appending to an existing skill
  • Confirm reusability (not a one-off fix)

2. Four-way verdict system (Step 5b)

Replaces binary save/don't-save with nuanced outcomes:

Verdict When Action
Save Unique, specific, well-scoped Save as new skill
Improve then Save Valuable but needs refinement Revise once, then re-evaluate
Absorb into [X] Should be part of existing skill Append to existing file
Drop Trivial, redundant, or abstract Explain and stop

The Absorb verdict is particularly important — it prevents skill file proliferation by directing related knowledge into existing skills rather than creating new files.

3. Verdict-specific confirmation flows (Step 6)

Each verdict type has a tailored output format, showing exactly the information needed for that decision.

4. Guideline dimensions (non-scored)

The same quality dimensions (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage) remain as guidance for the holistic judgment, but are no longer scored numerically.

Type

  • Skill / [x] Command / [ ] Agent / [ ] Hook

Testing

  • Tested locally with Claude Code CLI across multiple extraction sessions
  • Verified the checklist catches duplicates that numeric scoring missed
  • Confirmed Absorb verdict correctly identifies merge-worthy patterns

Checklist

  • Follows file naming conventions
  • Includes required sections
  • No hardcoded paths or personal information
  • English only

Summary by cubic

Upgrades /learn-eval to use a checklist and holistic verdict instead of numeric scores. Improves duplicate detection, save-location decisions, and prevents skill file proliferation.

  • New Features

    • Pre-save checklist: grep ~/.claude/skills/, check MEMORY.md, consider appending, confirm reusability
    • Four verdicts: Save, Improve then Save, Absorb into [X], Drop
    • Verdict-specific confirmation with path/diff, rationale, and checklist results before writing
    • Keeps quality dimensions as non-scored guidance
  • Bug Fixes

    • Fixed markdownlint MD001 by changing sub-step headings (5a, 5b) from #### to ###

Written for commit ae94edc. Summary will update on new commits.

Summary by CodeRabbit

  • Documentation
    • Switched evaluation workflow to a checklist-based quality gate and holistic verdicts (replacing numeric scoring).
    • Introduced four verdicts — Save, Improve then Save, Absorb into existing content, Drop — with verdict-specific confirmation flows and next actions.
    • Added guideline dimensions, explicit pre-evaluation steps (overlap/reusability checks), and output formatting examples (sample checklist and verdict presentation).
    • Included design rationale and a rule to append absorbed items to existing skills rather than creating new entries.

Replace the 5-dimension numeric scoring rubric with a checklist + holistic
verdict system (Save / Improve then Save / Absorb into [X] / Drop).

Key improvements:
- Explicit pre-save checklist: grep skills/ for duplicates, check MEMORY.md,
  consider appending to existing skills, confirm reusability
- 4-way verdict instead of binary save/don't-save: adds "Absorb into [X]"
  to prevent skill file proliferation, and "Improve then Save" for iterative
  refinement
- Verdict-specific confirmation flows tailored to each outcome
- Design rationale explaining why holistic judgment outperforms numeric
  scoring with modern frontier models
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 8, 2026

📝 Walkthrough

Walkthrough

The commands/learn-eval.md documentation replaces the numeric 5-dimension scoring rubric with a checklist-first evaluation and a four-option verdict model (Save, Improve then Save, Absorb, Drop), adding procedural pre-checks, verdict-specific confirmation flows, output formatting, and a design rationale.

Changes

Cohort / File(s) Summary
Evaluation workflow documentation
commands/learn-eval.md
Replaced numeric self-evaluation rubric with a checklist-based "Quality gate — Checklist + Holistic verdict" model. Added procedural pre-evaluation checks, guideline dimensions, verdict options (Save / Improve then Save / Absorb / Drop) with confirmation flows and output formatting samples; updated rule to append on "Absorb".

Sequence Diagram(s)

sequenceDiagram
  participant User as User
  participant CLI as CLI / Commands
  participant Eval as Evaluator (Checklist + Verdict)
  participant KB as KnowledgeBase / Skills
  participant FS as FileSystem

  rect rgba(200,220,255,0.5)
    User->>CLI: invoke /learn (submit artifact)
    CLI->>Eval: run pre-evaluation checks + checklist
    Eval->>Eval: compute guideline dimensions + holistic verdict
  end

  alt Verdict: Save
    Eval->>FS: create or update learning file
    FS-->>CLI: confirmation
    CLI-->>User: "Saved"
  else Verdict: Improve then Save
    Eval-->>CLI: return improvement suggestions
    User->>CLI: apply improvements
    CLI->>FS: save after updates
    CLI-->>User: "Improved and saved"
  else Verdict: Absorb
    Eval->>KB: append content to existing skill
    KB-->>CLI: confirmation
    CLI-->>User: "Absorbed into [X]"
  else Verdict: Drop
    Eval-->>CLI: show reason and abort save
    CLI-->>User: "Dropped"
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested reviewers

  • affaan-m

Poem

🐰
I hopped through notes where numbers fell,
A checklist sang and rang the bell,
Save, Absorb, or let it drop—
I nibble logic, then I hop! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically summarizes the main change: replacing numeric scoring with a checklist-based holistic verdict system in the learn-eval command.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Owner

@affaan-m affaan-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review: checks are failing. Please fix failures before review.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="commands/learn-eval.md">

<violation number="1" location="commands/learn-eval.md:60">
P2: Checklist grep only checks global skills (`~/.claude/skills/`), missing project-level skills (`.claude/skills/`). Since step 3 defines both as valid save locations, duplicates in project skills would go undetected. The MEMORY.md item already says "both project and global" — apply the same pattern here.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

6. Ask user to confirm:
- Show: proposed save path + scores table + final draft
- Wait for explicit confirmation before writing
- [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
Copy link

@cubic-dev-ai cubic-dev-ai bot Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Checklist grep only checks global skills (~/.claude/skills/), missing project-level skills (.claude/skills/). Since step 3 defines both as valid save locations, duplicates in project skills would go undetected. The MEMORY.md item already says "both project and global" — apply the same pattern here.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At commands/learn-eval.md, line 60:

<comment>Checklist grep only checks global skills (`~/.claude/skills/`), missing project-level skills (`.claude/skills/`). Since step 3 defines both as valid save locations, duplicates in project skills would go undetected. The MEMORY.md item already says "both project and global" — apply the same pattern here.</comment>

<file context>
@@ -51,41 +51,65 @@ origin: auto-extracted
-6. Ask user to confirm:
-   - Show: proposed save path + scores table + final draft
-   - Wait for explicit confirmation before writing
+   - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
+   - [ ] Check MEMORY.md (both project and global) for overlap
+   - [ ] Consider whether appending to an existing skill would suffice
</file context>
Suggested change
- [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
- [ ] Grep `~/.claude/skills/` and `.claude/skills/` by keyword to check for content overlap
Fix with Cubic

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@commands/learn-eval.md`:
- Around line 83-89: Step 6's "Verdict-specific confirmation flow" lists
behaviors for only three verdicts while Step 5b introduces four; add an explicit
"Improve then Save" branch under that list that describes running the one
allowed re-evaluation/improvement, presenting the revised draft along with save
path, checklist results, and a 1-line verdict rationale, and then saving only
after user confirmation so the post-revision path is unambiguous (refer to the
"Verdict-specific confirmation flow" section and the four verdicts introduced in
Step 5b to place the new branch).
- Around line 56-63: The checklist currently only searches the global skills
directory (`~/.claude/skills/`) and can miss project-local saves; update the
"Grep `~/.claude/skills/` by keyword to check for content overlap" item in
commands/learn-eval.md (the required checklist under "Execute all of the
following") to also search the project-local skills folder (e.g.,
`.claude/skills/learned/` or `.claude/skills/`) so duplicates saved by Step 3
are detected — change the bullet to instruct grepping both the global path and
the project-local `.claude/skills/learned/` (or perform a recursive search in
both locations).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ed275d9f-bba2-4032-b36f-2e3fe1f23aee

📥 Commits

Reviewing files that changed from the base of the PR and between 6090401 and 08db389.

📒 Files selected for processing (1)
  • commands/learn-eval.md

Comment on lines +56 to +63
#### 5a. Required checklist (verify by actually reading files)

- Score each dimension 1–5
- If any dimension scores 1–2, improve the draft and re-score until all dimensions are ≥ 3
- Show the user the scores table and the final draft
Execute **all** of the following before evaluating the draft:

6. Ask user to confirm:
- Show: proposed save path + scores table + final draft
- Wait for explicit confirmation before writing
- [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
- [ ] Check MEMORY.md (both project and global) for overlap
- [ ] Consider whether appending to an existing skill would suffice
- [ ] Confirm this is a reusable pattern, not a one-off fix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Check project-local skills for overlap too.

The checklist only greps ~/.claude/skills/, but Step 3 can route saves into .claude/skills/learned/ as well. That means project-local duplicates can slip through and turn an Absorb into an incorrect Save.

Suggested wording
-   - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
+   - [ ] Grep `~/.claude/skills/` and relevant project `.claude/skills/` files by keyword to check for content overlap
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@commands/learn-eval.md` around lines 56 - 63, The checklist currently only
searches the global skills directory (`~/.claude/skills/`) and can miss
project-local saves; update the "Grep `~/.claude/skills/` by keyword to check
for content overlap" item in commands/learn-eval.md (the required checklist
under "Execute all of the following") to also search the project-local skills
folder (e.g., `.claude/skills/learned/` or `.claude/skills/`) so duplicates
saved by Step 3 are detected — change the bullet to instruct grepping both the
global path and the project-local `.claude/skills/learned/` (or perform a
recursive search in both locations).

Comment on lines +83 to +89
6. **Verdict-specific confirmation flow**

- **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation
- **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation
- **Drop**: Show checklist results + reasoning only (no confirmation needed)

7. Save / Absorb to the determined location
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add the Improve then Save branch to Step 6.

Step 5b introduces four verdicts, but Step 6 only specifies confirmation behavior for three. Without an explicit post-revision path, the command can end up ambiguous after the one allowed re-evaluation.

Suggested wording
 6. **Verdict-specific confirmation flow**
 
+   - **Improve then Save**: Present the required improvements + revised draft + updated checklist/verdict after one re-evaluation; if the revised verdict is **Save**, save after user confirmation, otherwise follow the new verdict
    - **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation
    - **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation
    - **Drop**: Show checklist results + reasoning only (no confirmation needed)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@commands/learn-eval.md` around lines 83 - 89, Step 6's "Verdict-specific
confirmation flow" lists behaviors for only three verdicts while Step 5b
introduces four; add an explicit "Improve then Save" branch under that list that
describes running the one allowed re-evaluation/improvement, presenting the
revised draft along with save path, checklist results, and a 1-line verdict
rationale, and then saving only after user confirmation so the post-revision
path is unambiguous (refer to the "Verdict-specific confirmation flow" section
and the four verdicts introduced in Step 5b to place the new branch).

Copy link
Owner

@affaan-m affaan-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review: checks are failing. Please fix failures before review.

Change h4 (####) to h3 (###) for sub-steps 5a and 5b to comply with
heading increment rule (headings must increment by one level at a time).
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
commands/learn-eval.md (2)

83-89: ⚠️ Potential issue | 🟠 Major

Add the Improve then Save branch to Step 6.

Step 5b defines four verdicts, but this flow only explains three of them. Without an explicit post-revision branch, the command is ambiguous after the one allowed re-evaluation. Based on learnings, review todo lists to identify out of order steps.

Suggested wording
 6. **Verdict-specific confirmation flow**
 
+   - **Improve then Save**: Present the required improvements + revised draft + updated checklist/verdict after one re-evaluation; if the revised verdict is **Save**, save after user confirmation, otherwise follow the new verdict
    - **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation
    - **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation
    - **Drop**: Show checklist results + reasoning only (no confirmation needed)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@commands/learn-eval.md` around lines 83 - 89, Step 6 is missing the "Improve
then Save" verdict from Step 5b; add a fourth branch named "Improve then Save"
to the verdict-specific confirmation flow that (1) presents suggested revisions
or a short improvement plan + diff of changes, (2) runs the single allowed
re-evaluation/revision to produce an updated draft, (3) shows the updated
checklist results + 1-line verdict rationale + full revised draft for user
review, and (4) prompts the user to confirm before saving—update the Step 6 text
to enumerate this branch alongside "Save", "Absorb into [X]" and "Drop".

60-60: ⚠️ Potential issue | 🟠 Major

Check project-local skills for overlap too.

This checklist only searches ~/.claude/skills/, but Step 3 also allows saving under .claude/skills/learned/. That can miss duplicates and turn an Absorb into an incorrect Save. Based on learnings, review todo lists to identify out of order steps.

Suggested wording
-   - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
+   - [ ] Grep `~/.claude/skills/` and relevant project `.claude/skills/` files by keyword to check for content overlap
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@commands/learn-eval.md` at line 60, Update the checklist item that currently
reads "Grep `~/.claude/skills/` by keyword to check for content overlap" to also
search the project-local skills directories and learned subfolder: change the
instruction to grep both `~/.claude/skills/` and `./.claude/skills/` (including
`./.claude/skills/learned/`) by keyword so duplicates saved locally aren’t
missed; ensure the wording clearly instructs checking both global and
project-local paths to avoid misclassifying Absorb vs Save.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@commands/learn-eval.md`:
- Around line 83-89: Step 6 is missing the "Improve then Save" verdict from Step
5b; add a fourth branch named "Improve then Save" to the verdict-specific
confirmation flow that (1) presents suggested revisions or a short improvement
plan + diff of changes, (2) runs the single allowed re-evaluation/revision to
produce an updated draft, (3) shows the updated checklist results + 1-line
verdict rationale + full revised draft for user review, and (4) prompts the user
to confirm before saving—update the Step 6 text to enumerate this branch
alongside "Save", "Absorb into [X]" and "Drop".
- Line 60: Update the checklist item that currently reads "Grep
`~/.claude/skills/` by keyword to check for content overlap" to also search the
project-local skills directories and learned subfolder: change the instruction
to grep both `~/.claude/skills/` and `./.claude/skills/` (including
`./.claude/skills/learned/`) by keyword so duplicates saved locally aren’t
missed; ensure the wording clearly instructs checking both global and
project-local paths to avoid misclassifying Absorb vs Save.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 99600bbc-4739-4575-804e-8e22dfad0128

📥 Commits

Reviewing files that changed from the base of the PR and between 08db389 and ae94edc.

📒 Files selected for processing (1)
  • commands/learn-eval.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants