Skip to content
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 52 additions & 27 deletions commands/learn-eval.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
description: Extract reusable patterns from the session, self-evaluate quality before saving, and determine the right save location (Global vs Project).
description: "Extract reusable patterns from the session, self-evaluate quality before saving, and determine the right save location (Global vs Project)."
---

# /learn-eval - Extract, Evaluate, then Save

Extends `/learn` with a quality gate and save-location decision before writing any skill file.
Extends `/learn` with a quality gate, save-location decision, and knowledge-placement awareness before writing any skill file.

## What to Extract

Expand Down Expand Up @@ -51,41 +51,66 @@ origin: auto-extracted
[Trigger conditions]
```

5. **Self-evaluate before saving** using this rubric:
5. **Quality gate — Checklist + Holistic verdict**

| Dimension | 1 | 3 | 5 |
|-----------|---|---|---|
| Specificity | Abstract principles only, no code examples | Representative code example present | Rich examples covering all usage patterns |
| Actionability | Unclear what to do | Main steps are understandable | Immediately actionable, edge cases covered |
| Scope Fit | Too broad or too narrow | Mostly appropriate, some boundary ambiguity | Name, trigger, and content perfectly aligned |
| Non-redundancy | Nearly identical to another skill | Some overlap but unique perspective exists | Completely unique value |
| Coverage | Covers only a fraction of the target task | Main cases covered, common variants missing | Main cases, edge cases, and pitfalls covered |
### 5a. Required checklist (verify by actually reading files)

- Score each dimension 1–5
- If any dimension scores 1–2, improve the draft and re-score until all dimensions are ≥ 3
- Show the user the scores table and the final draft
Execute **all** of the following before evaluating the draft:

6. Ask user to confirm:
- Show: proposed save path + scores table + final draft
- Wait for explicit confirmation before writing
- [ ] Grep `~/.claude/skills/` and relevant project `.claude/skills/` files by keyword to check for content overlap
- [ ] Check MEMORY.md (both project and global) for overlap
- [ ] Consider whether appending to an existing skill would suffice
- [ ] Confirm this is a reusable pattern, not a one-off fix

7. Save to the determined location
### 5b. Holistic verdict

## Output Format for Step 5 (scores table)
Synthesize the checklist results and draft quality, then choose **one** of the following:

| Dimension | Score | Rationale |
|-----------|-------|-----------|
| Specificity | N/5 | ... |
| Actionability | N/5 | ... |
| Scope Fit | N/5 | ... |
| Non-redundancy | N/5 | ... |
| Coverage | N/5 | ... |
| **Total** | **N/25** | |
| Verdict | Meaning | Next Action |
|---------|---------|-------------|
| **Save** | Unique, specific, well-scoped | Proceed to Step 6 |
| **Improve then Save** | Valuable but needs refinement | List improvements → revise → re-evaluate (once) |
| **Absorb into [X]** | Should be appended to an existing skill | Show target skill and additions → Step 6 |
| **Drop** | Trivial, redundant, or too abstract | Explain reasoning and stop |

**Guideline dimensions** (informing the verdict, not scored):

- **Specificity & Actionability**: Contains code examples or commands that are immediately usable
- **Scope Fit**: Name, trigger conditions, and content are aligned and focused on a single pattern
- **Uniqueness**: Provides value not covered by existing skills (informed by checklist results)
- **Reusability**: Realistic trigger scenarios exist in future sessions

6. **Verdict-specific confirmation flow**

- **Improve then Save**: Present the required improvements + revised draft + updated checklist/verdict after one re-evaluation; if the revised verdict is **Save**, save after user confirmation, otherwise follow the new verdict
- **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation
- **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation
- **Drop**: Show checklist results + reasoning only (no confirmation needed)

7. Save / Absorb to the determined location

Comment on lines +90 to +91
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Step 7 doesn't account for all verdicts.

"Save / Absorb to the determined location" is incomplete and misleading:

  • Drop verdict performs no save or absorb action
  • Absorb verdict appends to an existing skill file (not the location determined in Step 3)
  • Only Save and Improve then Save use the Step 3 location

Reword to clarify the conditional nature:

📝 Suggested rewording
-7. Save / Absorb to the determined location
+7. **Execute the verdict action**
+   - **Save** / **Improve then Save**: Save new skill file to the location determined in Step 3
+   - **Absorb into [X]**: Append additions to the existing target skill file
+   - **Drop**: No file action required
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@commands/learn-eval.md` around lines 90 - 91, Reword Step 7 to clearly
describe conditional actions based on the verdict: state that for verdict "Drop"
no save/absorb occurs; "Absorb" appends content to an existing skill file (not
the Step 3 location); and only "Save" and "Improve then Save" write to the
location determined in Step 3; use the terms used in the doc (Step 3 location,
verdict names Drop/Absorb/Save/Improve then Save) so readers understand which
verdict triggers which storage behavior.

## Output Format for Step 5

```
### Checklist
- [x] skills/ grep: no overlap (or: overlap found → details)
- [x] MEMORY.md: no overlap (or: overlap found → details)
- [x] Existing skill append: new file appropriate (or: should append to [X])
- [x] Reusability: confirmed (or: one-off → Drop)

### Verdict: Save / Improve then Save / Absorb into [X] / Drop

**Rationale:** (1-2 sentences explaining the verdict)
```

## Design Rationale

This version replaces the previous 5-dimension numeric scoring rubric (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage scored 1-5) with a checklist-based holistic verdict system. Modern frontier models (Opus 4.6+) have strong contextual judgment — forcing rich qualitative signals into numeric scores loses nuance and can produce misleading totals. The holistic approach lets the model weigh all factors naturally, producing more accurate save/drop decisions while the explicit checklist ensures no critical check is skipped.

## Notes

- Don't extract trivial fixes (typos, simple syntax errors)
- Don't extract one-time issues (specific API outages, etc.)
- Focus on patterns that will save time in future sessions
- Keep skills focused — one pattern per skill
- If Coverage score is low, add related variants before saving
- When the verdict is Absorb, append to the existing skill rather than creating a new file
Loading