feat(commands): improve learn-eval with checklist-based holistic verdict by shimo4228 · Pull Request #360 · affaan-m/everything-claude-code

shimo4228 · 2026-03-08T10:35:53Z

Summary

Improves the /learn-eval command by replacing the 5-dimension numeric scoring rubric with a checklist-based holistic verdict system.

Motivation

The original learn-eval used a scoring table (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage — each scored 1–5). In practice with modern frontier models (Opus 4.6+), this approach has a key limitation: forcing rich qualitative signals into numeric scores loses nuance and can produce misleading totals. For example, a pattern might score 3/5 on Coverage but 5/5 on everything else — the total (23/25) looks great, but the Coverage gap might be critical.

Modern models have strong contextual judgment. Rather than quantizing that judgment into numbers, this update lets the model weigh all factors holistically while an explicit checklist ensures no critical verification step is skipped.

Key Changes

1. Explicit pre-save checklist (Step 5a)

Before any verdict, the model must actually verify:

Grep ~/.claude/skills/ for content overlap
Check MEMORY.md for overlap
Consider appending to an existing skill
Confirm reusability (not a one-off fix)

2. Four-way verdict system (Step 5b)

Replaces binary save/don't-save with nuanced outcomes:

Verdict	When	Action
Save	Unique, specific, well-scoped	Save as new skill
Improve then Save	Valuable but needs refinement	Revise once, then re-evaluate
Absorb into [X]	Should be part of existing skill	Append to existing file
Drop	Trivial, redundant, or abstract	Explain and stop

The Absorb verdict is particularly important — it prevents skill file proliferation by directing related knowledge into existing skills rather than creating new files.

3. Verdict-specific confirmation flows (Step 6)

Each verdict type has a tailored output format, showing exactly the information needed for that decision.

4. Guideline dimensions (non-scored)

The same quality dimensions (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage) remain as guidance for the holistic judgment, but are no longer scored numerically.

Type

Skill / [x] Command / [ ] Agent / [ ] Hook

Testing

Tested locally with Claude Code CLI across multiple extraction sessions
Verified the checklist catches duplicates that numeric scoring missed
Confirmed Absorb verdict correctly identifies merge-worthy patterns

Checklist

Follows file naming conventions
Includes required sections
No hardcoded paths or personal information
English only

Summary by cubic

Upgrades /learn-eval to use a checklist and holistic verdict instead of numeric scores. Improves duplicate detection, save-location decisions, and prevents skill file proliferation.

New Features
- Pre-save checklist: grep ~/.claude/skills/, check MEMORY.md, consider appending, confirm reusability
- Four verdicts: Save, Improve then Save, Absorb into [X], Drop
- Verdict-specific confirmation with path/diff, rationale, and checklist results before writing
- Keeps quality dimensions as non-scored guidance
Bug Fixes
- Fixed markdownlint MD001 by changing sub-step headings (5a, 5b) from #### to ###

^{Written for commit ae94edc. Summary will update on new commits.}

Summary by CodeRabbit

Documentation
- Switched evaluation workflow to a checklist-based quality gate and holistic verdicts (replacing numeric scoring).
- Introduced four verdicts — Save, Improve then Save, Absorb into existing content, Drop — with verdict-specific confirmation flows and next actions.
- Added guideline dimensions, explicit pre-evaluation steps (overlap/reusability checks), and output formatting examples (sample checklist and verdict presentation).
- Included design rationale and a rule to append absorbed items to existing skills rather than creating new entries.

Replace the 5-dimension numeric scoring rubric with a checklist + holistic verdict system (Save / Improve then Save / Absorb into [X] / Drop). Key improvements: - Explicit pre-save checklist: grep skills/ for duplicates, check MEMORY.md, consider appending to existing skills, confirm reusability - 4-way verdict instead of binary save/don't-save: adds "Absorb into [X]" to prevent skill file proliferation, and "Improve then Save" for iterative refinement - Verdict-specific confirmation flows tailored to each outcome - Design rationale explaining why holistic judgment outperforms numeric scoring with modern frontier models

coderabbitai · 2026-03-08T10:36:10Z

📝 Walkthrough

Walkthrough

The commands/learn-eval.md documentation replaces the numeric 5-dimension scoring rubric with a checklist-first evaluation and a four-option verdict model (Save, Improve then Save, Absorb, Drop), adding procedural pre-checks, verdict-specific confirmation flows, output formatting, and a design rationale.

Changes

Cohort / File(s)	Summary
Evaluation workflow documentation `commands/learn-eval.md`	Replaced numeric self-evaluation rubric with a checklist-based "Quality gate — Checklist + Holistic verdict" model. Added procedural pre-evaluation checks, guideline dimensions, verdict options (Save / Improve then Save / Absorb / Drop) with confirmation flows and output formatting samples; updated rule to append on "Absorb".

Sequence Diagram(s)

sequenceDiagram
  participant User as User
  participant CLI as CLI / Commands
  participant Eval as Evaluator (Checklist + Verdict)
  participant KB as KnowledgeBase / Skills
  participant FS as FileSystem

  rect rgba(200,220,255,0.5)
    User->>CLI: invoke /learn (submit artifact)
    CLI->>Eval: run pre-evaluation checks + checklist
    Eval->>Eval: compute guideline dimensions + holistic verdict
  end

  alt Verdict: Save
    Eval->>FS: create or update learning file
    FS-->>CLI: confirmation
    CLI-->>User: "Saved"
  else Verdict: Improve then Save
    Eval-->>CLI: return improvement suggestions
    User->>CLI: apply improvements
    CLI->>FS: save after updates
    CLI-->>User: "Improved and saved"
  else Verdict: Absorb
    Eval->>KB: append content to existing skill
    KB-->>CLI: confirmation
    CLI-->>User: "Absorbed into [X]"
  else Verdict: Drop
    Eval-->>CLI: show reason and abort save
    CLI-->>User: "Dropped"
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

feat(commands): add learn-eval command #263 — Prior PR that introduced the checklist + holistic verdict framework that this change refines and expands.

Suggested reviewers

affaan-m

Poem

🐰
I hopped through notes where numbers fell,
A checklist sang and rang the bell,
Save, Absorb, or let it drop—
I nibble logic, then I hop! 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title clearly and specifically summarizes the main change: replacing numeric scoring with a checklist-based holistic verdict system in the learn-eval command.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

affaan-m

Automated review: checks are failing. Please fix failures before review.

cubic-dev-ai

1 issue found across 1 file

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="commands/learn-eval.md">

<violation number="1" location="commands/learn-eval.md:60">
P2: Checklist grep only checks global skills (`~/.claude/skills/`), missing project-level skills (`.claude/skills/`). Since step 3 defines both as valid save locations, duplicates in project skills would go undetected. The MEMORY.md item already says "both project and global" — apply the same pattern here.</violation>
</file>

Since this is your first cubic review, here's how it works:

cubic automatically reviews your code and comments on bugs and improvements
Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
Ask questions if you need clarification on any suggestion

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-08T10:38:14Z

commands/learn-eval.md

-6. Ask user to confirm:
-   - Show: proposed save path + scores table + final draft
-   - Wait for explicit confirmation before writing
+   - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap


P2: Checklist grep only checks global skills (~/.claude/skills/), missing project-level skills (.claude/skills/). Since step 3 defines both as valid save locations, duplicates in project skills would go undetected. The MEMORY.md item already says "both project and global" — apply the same pattern here.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At commands/learn-eval.md, line 60: <comment>Checklist grep only checks global skills (`~/.claude/skills/`), missing project-level skills (`.claude/skills/`). Since step 3 defines both as valid save locations, duplicates in project skills would go undetected. The MEMORY.md item already says "both project and global" — apply the same pattern here.</comment> <file context> @@ -51,41 +51,65 @@ origin: auto-extracted -6. Ask user to confirm: - - Show: proposed save path + scores table + final draft - - Wait for explicit confirmation before writing + - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap + - [ ] Check MEMORY.md (both project and global) for overlap + - [ ] Consider whether appending to an existing skill would suffice </file context>

Suggested change

- [ ] Grep `~/.claude/skills/` by keyword to check for content overlap

- [ ] Grep `~/.claude/skills/` and `.claude/skills/` by keyword to check for content overlap

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@commands/learn-eval.md`:
- Around line 83-89: Step 6's "Verdict-specific confirmation flow" lists
behaviors for only three verdicts while Step 5b introduces four; add an explicit
"Improve then Save" branch under that list that describes running the one
allowed re-evaluation/improvement, presenting the revised draft along with save
path, checklist results, and a 1-line verdict rationale, and then saving only
after user confirmation so the post-revision path is unambiguous (refer to the
"Verdict-specific confirmation flow" section and the four verdicts introduced in
Step 5b to place the new branch).
- Around line 56-63: The checklist currently only searches the global skills
directory (`~/.claude/skills/`) and can miss project-local saves; update the
"Grep `~/.claude/skills/` by keyword to check for content overlap" item in
commands/learn-eval.md (the required checklist under "Execute all of the
following") to also search the project-local skills folder (e.g.,
`.claude/skills/learned/` or `.claude/skills/`) so duplicates saved by Step 3
are detected — change the bullet to instruct grepping both the global path and
the project-local `.claude/skills/learned/` (or perform a recursive search in
both locations).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ed275d9f-bba2-4032-b36f-2e3fe1f23aee

📥 Commits

Reviewing files that changed from the base of the PR and between 6090401 and 08db389.

📒 Files selected for processing (1)

commands/learn-eval.md

coderabbitai · 2026-03-08T10:39:17Z

commands/learn-eval.md

+   #### 5a. Required checklist (verify by actually reading files)

-   - Score each dimension 1–5
-   - If any dimension scores 1–2, improve the draft and re-score until all dimensions are ≥ 3
-   - Show the user the scores table and the final draft
+   Execute **all** of the following before evaluating the draft:

-6. Ask user to confirm:
-   - Show: proposed save path + scores table + final draft
-   - Wait for explicit confirmation before writing
+   - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
+   - [ ] Check MEMORY.md (both project and global) for overlap
+   - [ ] Consider whether appending to an existing skill would suffice
+   - [ ] Confirm this is a reusable pattern, not a one-off fix


⚠️ Potential issue | 🟠 Major

Check project-local skills for overlap too.

The checklist only greps ~/.claude/skills/, but Step 3 can route saves into .claude/skills/learned/ as well. That means project-local duplicates can slip through and turn an Absorb into an incorrect Save.

Suggested wording

- - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap + - [ ] Grep `~/.claude/skills/` and relevant project `.claude/skills/` files by keyword to check for content overlap

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@commands/learn-eval.md` around lines 56 - 63, The checklist currently only searches the global skills directory (`~/.claude/skills/`) and can miss project-local saves; update the "Grep `~/.claude/skills/` by keyword to check for content overlap" item in commands/learn-eval.md (the required checklist under "Execute all of the following") to also search the project-local skills folder (e.g., `.claude/skills/learned/` or `.claude/skills/`) so duplicates saved by Step 3 are detected — change the bullet to instruct grepping both the global path and the project-local `.claude/skills/learned/` (or perform a recursive search in both locations).

coderabbitai · 2026-03-08T10:39:17Z

commands/learn-eval.md

+6. **Verdict-specific confirmation flow**
+
+   - **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation
+   - **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation
+   - **Drop**: Show checklist results + reasoning only (no confirmation needed)
+
+7. Save / Absorb to the determined location


⚠️ Potential issue | 🟠 Major

Add the Improve then Save branch to Step 6.

Step 5b introduces four verdicts, but Step 6 only specifies confirmation behavior for three. Without an explicit post-revision path, the command can end up ambiguous after the one allowed re-evaluation.

Suggested wording

6. **Verdict-specific confirmation flow** + - **Improve then Save**: Present the required improvements + revised draft + updated checklist/verdict after one re-evaluation; if the revised verdict is **Save**, save after user confirmation, otherwise follow the new verdict - **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation - **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation - **Drop**: Show checklist results + reasoning only (no confirmation needed)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@commands/learn-eval.md` around lines 83 - 89, Step 6's "Verdict-specific confirmation flow" lists behaviors for only three verdicts while Step 5b introduces four; add an explicit "Improve then Save" branch under that list that describes running the one allowed re-evaluation/improvement, presenting the revised draft along with save path, checklist results, and a 1-line verdict rationale, and then saving only after user confirmation so the post-revision path is unambiguous (refer to the "Verdict-specific confirmation flow" section and the four verdicts introduced in Step 5b to place the new branch).

affaan-m

Automated review: checks are failing. Please fix failures before review.

Change h4 (####) to h3 (###) for sub-steps 5a and 5b to comply with heading increment rule (headings must increment by one level at a time).

coderabbitai

♻️ Duplicate comments (2)

commands/learn-eval.md (2)

83-89: ⚠️ Potential issue | 🟠 Major

Add the Improve then Save branch to Step 6.

Step 5b defines four verdicts, but this flow only explains three of them. Without an explicit post-revision branch, the command is ambiguous after the one allowed re-evaluation. Based on learnings, review todo lists to identify out of order steps.

Suggested wording

 6. **Verdict-specific confirmation flow**
 
+   - **Improve then Save**: Present the required improvements + revised draft + updated checklist/verdict after one re-evaluation; if the revised verdict is **Save**, save after user confirmation, otherwise follow the new verdict
    - **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation
    - **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation
    - **Drop**: Show checklist results + reasoning only (no confirmation needed)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@commands/learn-eval.md` around lines 83 - 89, Step 6 is missing the "Improve
then Save" verdict from Step 5b; add a fourth branch named "Improve then Save"
to the verdict-specific confirmation flow that (1) presents suggested revisions
or a short improvement plan + diff of changes, (2) runs the single allowed
re-evaluation/revision to produce an updated draft, (3) shows the updated
checklist results + 1-line verdict rationale + full revised draft for user
review, and (4) prompts the user to confirm before saving—update the Step 6 text
to enumerate this branch alongside "Save", "Absorb into [X]" and "Drop".

60-60: ⚠️ Potential issue | 🟠 Major

Check project-local skills for overlap too.

This checklist only searches ~/.claude/skills/, but Step 3 also allows saving under .claude/skills/learned/. That can miss duplicates and turn an Absorb into an incorrect Save. Based on learnings, review todo lists to identify out of order steps.

Suggested wording

-   - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
+   - [ ] Grep `~/.claude/skills/` and relevant project `.claude/skills/` files by keyword to check for content overlap

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@commands/learn-eval.md` at line 60, Update the checklist item that currently
reads "Grep `~/.claude/skills/` by keyword to check for content overlap" to also
search the project-local skills directories and learned subfolder: change the
instruction to grep both `~/.claude/skills/` and `./.claude/skills/` (including
`./.claude/skills/learned/`) by keyword so duplicates saved locally aren’t
missed; ensure the wording clearly instructs checking both global and
project-local paths to avoid misclassifying Absorb vs Save.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@commands/learn-eval.md`:
- Around line 83-89: Step 6 is missing the "Improve then Save" verdict from Step
5b; add a fourth branch named "Improve then Save" to the verdict-specific
confirmation flow that (1) presents suggested revisions or a short improvement
plan + diff of changes, (2) runs the single allowed re-evaluation/revision to
produce an updated draft, (3) shows the updated checklist results + 1-line
verdict rationale + full revised draft for user review, and (4) prompts the user
to confirm before saving—update the Step 6 text to enumerate this branch
alongside "Save", "Absorb into [X]" and "Drop".
- Line 60: Update the checklist item that currently reads "Grep
`~/.claude/skills/` by keyword to check for content overlap" to also search the
project-local skills directories and learned subfolder: change the instruction
to grep both `~/.claude/skills/` and `./.claude/skills/` (including
`./.claude/skills/learned/`) by keyword so duplicates saved locally aren’t
missed; ensure the wording clearly instructs checking both global and
project-local paths to avoid misclassifying Absorb vs Save.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 99600bbc-4739-4575-804e-8e22dfad0128

📥 Commits

Reviewing files that changed from the base of the PR and between 08db389 and ae94edc.

📒 Files selected for processing (1)

commands/learn-eval.md

affaan-m requested changes Mar 8, 2026

View reviewed changes

cubic-dev-ai bot reviewed Mar 8, 2026

View reviewed changes

coderabbitai bot reviewed Mar 8, 2026

View reviewed changes

affaan-m requested changes Mar 8, 2026

View reviewed changes

fix: resolve markdownlint MD001 heading level violation

ae94edc

Change h4 (####) to h3 (###) for sub-steps 5a and 5b to comply with heading increment rule (headings must increment by one level at a time).

coderabbitai bot reviewed Mar 8, 2026

View reviewed changes

	- [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
	- [ ] Grep `~/.claude/skills/` and `.claude/skills/` by keyword to check for content overlap

Uh oh!

Conversation

shimo4228 commented Mar 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Key Changes

1. Explicit pre-save checklist (Step 5a)

2. Four-way verdict system (Step 5b)

3. Verdict-specific confirmation flows (Step 6)

4. Guideline dimensions (non-scored)

Type

Testing

Checklist

Summary by cubic

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

affaan-m left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

affaan-m left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shimo4228 commented Mar 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 8, 2026 •

edited

Loading

cubic-dev-ai bot Mar 8, 2026 •

edited

Loading