diff --git a/commands/learn-eval.md b/commands/learn-eval.md index 18d8853cb..53b7bf588 100644 --- a/commands/learn-eval.md +++ b/commands/learn-eval.md @@ -1,10 +1,10 @@ --- -description: Extract reusable patterns from the session, self-evaluate quality before saving, and determine the right save location (Global vs Project). +description: "Extract reusable patterns from the session, self-evaluate quality before saving, and determine the right save location (Global vs Project)." --- # /learn-eval - Extract, Evaluate, then Save -Extends `/learn` with a quality gate and save-location decision before writing any skill file. +Extends `/learn` with a quality gate, save-location decision, and knowledge-placement awareness before writing any skill file. ## What to Extract @@ -51,36 +51,60 @@ origin: auto-extracted [Trigger conditions] ``` -5. **Self-evaluate before saving** using this rubric: +5. **Quality gate — Checklist + Holistic verdict** - | Dimension | 1 | 3 | 5 | - |-----------|---|---|---| - | Specificity | Abstract principles only, no code examples | Representative code example present | Rich examples covering all usage patterns | - | Actionability | Unclear what to do | Main steps are understandable | Immediately actionable, edge cases covered | - | Scope Fit | Too broad or too narrow | Mostly appropriate, some boundary ambiguity | Name, trigger, and content perfectly aligned | - | Non-redundancy | Nearly identical to another skill | Some overlap but unique perspective exists | Completely unique value | - | Coverage | Covers only a fraction of the target task | Main cases covered, common variants missing | Main cases, edge cases, and pitfalls covered | + ### 5a. Required checklist (verify by actually reading files) - - Score each dimension 1–5 - - If any dimension scores 1–2, improve the draft and re-score until all dimensions are ≥ 3 - - Show the user the scores table and the final draft + Execute **all** of the following before evaluating the draft: -6. Ask user to confirm: - - Show: proposed save path + scores table + final draft - - Wait for explicit confirmation before writing + - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap + - [ ] Check MEMORY.md (both project and global) for overlap + - [ ] Consider whether appending to an existing skill would suffice + - [ ] Confirm this is a reusable pattern, not a one-off fix -7. Save to the determined location + ### 5b. Holistic verdict -## Output Format for Step 5 (scores table) + Synthesize the checklist results and draft quality, then choose **one** of the following: -| Dimension | Score | Rationale | -|-----------|-------|-----------| -| Specificity | N/5 | ... | -| Actionability | N/5 | ... | -| Scope Fit | N/5 | ... | -| Non-redundancy | N/5 | ... | -| Coverage | N/5 | ... | -| **Total** | **N/25** | | + | Verdict | Meaning | Next Action | + |---------|---------|-------------| + | **Save** | Unique, specific, well-scoped | Proceed to Step 6 | + | **Improve then Save** | Valuable but needs refinement | List improvements → revise → re-evaluate (once) | + | **Absorb into [X]** | Should be appended to an existing skill | Show target skill and additions → Step 6 | + | **Drop** | Trivial, redundant, or too abstract | Explain reasoning and stop | + + **Guideline dimensions** (informing the verdict, not scored): + + - **Specificity & Actionability**: Contains code examples or commands that are immediately usable + - **Scope Fit**: Name, trigger conditions, and content are aligned and focused on a single pattern + - **Uniqueness**: Provides value not covered by existing skills (informed by checklist results) + - **Reusability**: Realistic trigger scenarios exist in future sessions + +6. **Verdict-specific confirmation flow** + + - **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation + - **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation + - **Drop**: Show checklist results + reasoning only (no confirmation needed) + +7. Save / Absorb to the determined location + +## Output Format for Step 5 + +``` +### Checklist +- [x] skills/ grep: no overlap (or: overlap found → details) +- [x] MEMORY.md: no overlap (or: overlap found → details) +- [x] Existing skill append: new file appropriate (or: should append to [X]) +- [x] Reusability: confirmed (or: one-off → Drop) + +### Verdict: Save / Improve then Save / Absorb into [X] / Drop + +**Rationale:** (1-2 sentences explaining the verdict) +``` + +## Design Rationale + +This version replaces the previous 5-dimension numeric scoring rubric (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage scored 1-5) with a checklist-based holistic verdict system. Modern frontier models (Opus 4.6+) have strong contextual judgment — forcing rich qualitative signals into numeric scores loses nuance and can produce misleading totals. The holistic approach lets the model weigh all factors naturally, producing more accurate save/drop decisions while the explicit checklist ensures no critical check is skipped. ## Notes @@ -88,4 +112,4 @@ origin: auto-extracted - Don't extract one-time issues (specific API outages, etc.) - Focus on patterns that will save time in future sessions - Keep skills focused — one pattern per skill -- If Coverage score is low, add related variants before saving +- When the verdict is Absorb, append to the existing skill rather than creating a new file