feat(commands): improve learn-eval with checklist-based holistic verdict#360
feat(commands): improve learn-eval with checklist-based holistic verdict#360shimo4228 wants to merge 2 commits intoaffaan-m:mainfrom
Conversation
Replace the 5-dimension numeric scoring rubric with a checklist + holistic verdict system (Save / Improve then Save / Absorb into [X] / Drop). Key improvements: - Explicit pre-save checklist: grep skills/ for duplicates, check MEMORY.md, consider appending to existing skills, confirm reusability - 4-way verdict instead of binary save/don't-save: adds "Absorb into [X]" to prevent skill file proliferation, and "Improve then Save" for iterative refinement - Verdict-specific confirmation flows tailored to each outcome - Design rationale explaining why holistic judgment outperforms numeric scoring with modern frontier models
📝 WalkthroughWalkthroughThe Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant CLI as CLI / Commands
participant Eval as Evaluator (Checklist + Verdict)
participant KB as KnowledgeBase / Skills
participant FS as FileSystem
rect rgba(200,220,255,0.5)
User->>CLI: invoke /learn (submit artifact)
CLI->>Eval: run pre-evaluation checks + checklist
Eval->>Eval: compute guideline dimensions + holistic verdict
end
alt Verdict: Save
Eval->>FS: create or update learning file
FS-->>CLI: confirmation
CLI-->>User: "Saved"
else Verdict: Improve then Save
Eval-->>CLI: return improvement suggestions
User->>CLI: apply improvements
CLI->>FS: save after updates
CLI-->>User: "Improved and saved"
else Verdict: Absorb
Eval->>KB: append content to existing skill
KB-->>CLI: confirmation
CLI-->>User: "Absorbed into [X]"
else Verdict: Drop
Eval-->>CLI: show reason and abort save
CLI-->>User: "Dropped"
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: checks are failing. Please fix failures before review.
There was a problem hiding this comment.
1 issue found across 1 file
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="commands/learn-eval.md">
<violation number="1" location="commands/learn-eval.md:60">
P2: Checklist grep only checks global skills (`~/.claude/skills/`), missing project-level skills (`.claude/skills/`). Since step 3 defines both as valid save locations, duplicates in project skills would go undetected. The MEMORY.md item already says "both project and global" — apply the same pattern here.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Add one-off context when rerunning by tagging
@cubic-dev-aiwith guidance or docs links (includingllms.txt) - Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| 6. Ask user to confirm: | ||
| - Show: proposed save path + scores table + final draft | ||
| - Wait for explicit confirmation before writing | ||
| - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap |
There was a problem hiding this comment.
P2: Checklist grep only checks global skills (~/.claude/skills/), missing project-level skills (.claude/skills/). Since step 3 defines both as valid save locations, duplicates in project skills would go undetected. The MEMORY.md item already says "both project and global" — apply the same pattern here.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At commands/learn-eval.md, line 60:
<comment>Checklist grep only checks global skills (`~/.claude/skills/`), missing project-level skills (`.claude/skills/`). Since step 3 defines both as valid save locations, duplicates in project skills would go undetected. The MEMORY.md item already says "both project and global" — apply the same pattern here.</comment>
<file context>
@@ -51,41 +51,65 @@ origin: auto-extracted
-6. Ask user to confirm:
- - Show: proposed save path + scores table + final draft
- - Wait for explicit confirmation before writing
+ - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
+ - [ ] Check MEMORY.md (both project and global) for overlap
+ - [ ] Consider whether appending to an existing skill would suffice
</file context>
| - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap | |
| - [ ] Grep `~/.claude/skills/` and `.claude/skills/` by keyword to check for content overlap |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@commands/learn-eval.md`:
- Around line 83-89: Step 6's "Verdict-specific confirmation flow" lists
behaviors for only three verdicts while Step 5b introduces four; add an explicit
"Improve then Save" branch under that list that describes running the one
allowed re-evaluation/improvement, presenting the revised draft along with save
path, checklist results, and a 1-line verdict rationale, and then saving only
after user confirmation so the post-revision path is unambiguous (refer to the
"Verdict-specific confirmation flow" section and the four verdicts introduced in
Step 5b to place the new branch).
- Around line 56-63: The checklist currently only searches the global skills
directory (`~/.claude/skills/`) and can miss project-local saves; update the
"Grep `~/.claude/skills/` by keyword to check for content overlap" item in
commands/learn-eval.md (the required checklist under "Execute all of the
following") to also search the project-local skills folder (e.g.,
`.claude/skills/learned/` or `.claude/skills/`) so duplicates saved by Step 3
are detected — change the bullet to instruct grepping both the global path and
the project-local `.claude/skills/learned/` (or perform a recursive search in
both locations).
commands/learn-eval.md
Outdated
| #### 5a. Required checklist (verify by actually reading files) | ||
|
|
||
| - Score each dimension 1–5 | ||
| - If any dimension scores 1–2, improve the draft and re-score until all dimensions are ≥ 3 | ||
| - Show the user the scores table and the final draft | ||
| Execute **all** of the following before evaluating the draft: | ||
|
|
||
| 6. Ask user to confirm: | ||
| - Show: proposed save path + scores table + final draft | ||
| - Wait for explicit confirmation before writing | ||
| - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap | ||
| - [ ] Check MEMORY.md (both project and global) for overlap | ||
| - [ ] Consider whether appending to an existing skill would suffice | ||
| - [ ] Confirm this is a reusable pattern, not a one-off fix |
There was a problem hiding this comment.
Check project-local skills for overlap too.
The checklist only greps ~/.claude/skills/, but Step 3 can route saves into .claude/skills/learned/ as well. That means project-local duplicates can slip through and turn an Absorb into an incorrect Save.
Suggested wording
- - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
+ - [ ] Grep `~/.claude/skills/` and relevant project `.claude/skills/` files by keyword to check for content overlap🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@commands/learn-eval.md` around lines 56 - 63, The checklist currently only
searches the global skills directory (`~/.claude/skills/`) and can miss
project-local saves; update the "Grep `~/.claude/skills/` by keyword to check
for content overlap" item in commands/learn-eval.md (the required checklist
under "Execute all of the following") to also search the project-local skills
folder (e.g., `.claude/skills/learned/` or `.claude/skills/`) so duplicates
saved by Step 3 are detected — change the bullet to instruct grepping both the
global path and the project-local `.claude/skills/learned/` (or perform a
recursive search in both locations).
| 6. **Verdict-specific confirmation flow** | ||
|
|
||
| - **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation | ||
| - **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation | ||
| - **Drop**: Show checklist results + reasoning only (no confirmation needed) | ||
|
|
||
| 7. Save / Absorb to the determined location |
There was a problem hiding this comment.
Add the Improve then Save branch to Step 6.
Step 5b introduces four verdicts, but Step 6 only specifies confirmation behavior for three. Without an explicit post-revision path, the command can end up ambiguous after the one allowed re-evaluation.
Suggested wording
6. **Verdict-specific confirmation flow**
+ - **Improve then Save**: Present the required improvements + revised draft + updated checklist/verdict after one re-evaluation; if the revised verdict is **Save**, save after user confirmation, otherwise follow the new verdict
- **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation
- **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation
- **Drop**: Show checklist results + reasoning only (no confirmation needed)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@commands/learn-eval.md` around lines 83 - 89, Step 6's "Verdict-specific
confirmation flow" lists behaviors for only three verdicts while Step 5b
introduces four; add an explicit "Improve then Save" branch under that list that
describes running the one allowed re-evaluation/improvement, presenting the
revised draft along with save path, checklist results, and a 1-line verdict
rationale, and then saving only after user confirmation so the post-revision
path is unambiguous (refer to the "Verdict-specific confirmation flow" section
and the four verdicts introduced in Step 5b to place the new branch).
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: checks are failing. Please fix failures before review.
Change h4 (####) to h3 (###) for sub-steps 5a and 5b to comply with heading increment rule (headings must increment by one level at a time).
There was a problem hiding this comment.
♻️ Duplicate comments (2)
commands/learn-eval.md (2)
83-89:⚠️ Potential issue | 🟠 MajorAdd the
Improve then Savebranch to Step 6.Step 5b defines four verdicts, but this flow only explains three of them. Without an explicit post-revision branch, the command is ambiguous after the one allowed re-evaluation. Based on learnings, review todo lists to identify out of order steps.
Suggested wording
6. **Verdict-specific confirmation flow** + - **Improve then Save**: Present the required improvements + revised draft + updated checklist/verdict after one re-evaluation; if the revised verdict is **Save**, save after user confirmation, otherwise follow the new verdict - **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation - **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation - **Drop**: Show checklist results + reasoning only (no confirmation needed)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@commands/learn-eval.md` around lines 83 - 89, Step 6 is missing the "Improve then Save" verdict from Step 5b; add a fourth branch named "Improve then Save" to the verdict-specific confirmation flow that (1) presents suggested revisions or a short improvement plan + diff of changes, (2) runs the single allowed re-evaluation/revision to produce an updated draft, (3) shows the updated checklist results + 1-line verdict rationale + full revised draft for user review, and (4) prompts the user to confirm before saving—update the Step 6 text to enumerate this branch alongside "Save", "Absorb into [X]" and "Drop".
60-60:⚠️ Potential issue | 🟠 MajorCheck project-local skills for overlap too.
This checklist only searches
~/.claude/skills/, but Step 3 also allows saving under.claude/skills/learned/. That can miss duplicates and turn anAbsorbinto an incorrectSave. Based on learnings, review todo lists to identify out of order steps.Suggested wording
- - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap + - [ ] Grep `~/.claude/skills/` and relevant project `.claude/skills/` files by keyword to check for content overlap🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@commands/learn-eval.md` at line 60, Update the checklist item that currently reads "Grep `~/.claude/skills/` by keyword to check for content overlap" to also search the project-local skills directories and learned subfolder: change the instruction to grep both `~/.claude/skills/` and `./.claude/skills/` (including `./.claude/skills/learned/`) by keyword so duplicates saved locally aren’t missed; ensure the wording clearly instructs checking both global and project-local paths to avoid misclassifying Absorb vs Save.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@commands/learn-eval.md`:
- Around line 83-89: Step 6 is missing the "Improve then Save" verdict from Step
5b; add a fourth branch named "Improve then Save" to the verdict-specific
confirmation flow that (1) presents suggested revisions or a short improvement
plan + diff of changes, (2) runs the single allowed re-evaluation/revision to
produce an updated draft, (3) shows the updated checklist results + 1-line
verdict rationale + full revised draft for user review, and (4) prompts the user
to confirm before saving—update the Step 6 text to enumerate this branch
alongside "Save", "Absorb into [X]" and "Drop".
- Line 60: Update the checklist item that currently reads "Grep
`~/.claude/skills/` by keyword to check for content overlap" to also search the
project-local skills directories and learned subfolder: change the instruction
to grep both `~/.claude/skills/` and `./.claude/skills/` (including
`./.claude/skills/learned/`) by keyword so duplicates saved locally aren’t
missed; ensure the wording clearly instructs checking both global and
project-local paths to avoid misclassifying Absorb vs Save.
Summary
Improves the
/learn-evalcommand by replacing the 5-dimension numeric scoring rubric with a checklist-based holistic verdict system.Motivation
The original learn-eval used a scoring table (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage — each scored 1–5). In practice with modern frontier models (Opus 4.6+), this approach has a key limitation: forcing rich qualitative signals into numeric scores loses nuance and can produce misleading totals. For example, a pattern might score 3/5 on Coverage but 5/5 on everything else — the total (23/25) looks great, but the Coverage gap might be critical.
Modern models have strong contextual judgment. Rather than quantizing that judgment into numbers, this update lets the model weigh all factors holistically while an explicit checklist ensures no critical verification step is skipped.
Key Changes
1. Explicit pre-save checklist (Step 5a)
Before any verdict, the model must actually verify:
~/.claude/skills/for content overlap2. Four-way verdict system (Step 5b)
Replaces binary save/don't-save with nuanced outcomes:
The Absorb verdict is particularly important — it prevents skill file proliferation by directing related knowledge into existing skills rather than creating new files.
3. Verdict-specific confirmation flows (Step 6)
Each verdict type has a tailored output format, showing exactly the information needed for that decision.
4. Guideline dimensions (non-scored)
The same quality dimensions (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage) remain as guidance for the holistic judgment, but are no longer scored numerically.
Type
Testing
Checklist
Summary by cubic
Upgrades
/learn-evalto use a checklist and holistic verdict instead of numeric scores. Improves duplicate detection, save-location decisions, and prevents skill file proliferation.New Features
~/.claude/skills/, checkMEMORY.md, consider appending, confirm reusabilityBug Fixes
####to###Written for commit ae94edc. Summary will update on new commits.
Summary by CodeRabbit