-
-
Notifications
You must be signed in to change notification settings - Fork 9.1k
feat(commands): improve learn-eval with checklist-based holistic verdict #360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
affaan-m
merged 4 commits into
affaan-m:main
from
shimo4228:feat/commands/learn-eval-v2
Mar 11, 2026
Merged
Changes from 3 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
08db389
feat(commands): improve learn-eval with checklist-based holistic verdict
shimo4228 ae94edc
fix: resolve markdownlint MD001 heading level violation
shimo4228 1d4ab47
docs: clarify learn-eval verdict flow
affaan-m 712ae16
Merge remote-tracking branch 'origin/main' into feat/commands/learn-e…
affaan-m File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,10 @@ | ||
| --- | ||
| description: Extract reusable patterns from the session, self-evaluate quality before saving, and determine the right save location (Global vs Project). | ||
| description: "Extract reusable patterns from the session, self-evaluate quality before saving, and determine the right save location (Global vs Project)." | ||
| --- | ||
|
|
||
| # /learn-eval - Extract, Evaluate, then Save | ||
|
|
||
| Extends `/learn` with a quality gate and save-location decision before writing any skill file. | ||
| Extends `/learn` with a quality gate, save-location decision, and knowledge-placement awareness before writing any skill file. | ||
|
|
||
| ## What to Extract | ||
|
|
||
|
|
@@ -51,41 +51,66 @@ origin: auto-extracted | |
| [Trigger conditions] | ||
| ``` | ||
|
|
||
| 5. **Self-evaluate before saving** using this rubric: | ||
| 5. **Quality gate — Checklist + Holistic verdict** | ||
|
|
||
| | Dimension | 1 | 3 | 5 | | ||
| |-----------|---|---|---| | ||
| | Specificity | Abstract principles only, no code examples | Representative code example present | Rich examples covering all usage patterns | | ||
| | Actionability | Unclear what to do | Main steps are understandable | Immediately actionable, edge cases covered | | ||
| | Scope Fit | Too broad or too narrow | Mostly appropriate, some boundary ambiguity | Name, trigger, and content perfectly aligned | | ||
| | Non-redundancy | Nearly identical to another skill | Some overlap but unique perspective exists | Completely unique value | | ||
| | Coverage | Covers only a fraction of the target task | Main cases covered, common variants missing | Main cases, edge cases, and pitfalls covered | | ||
| ### 5a. Required checklist (verify by actually reading files) | ||
|
|
||
| - Score each dimension 1–5 | ||
| - If any dimension scores 1–2, improve the draft and re-score until all dimensions are ≥ 3 | ||
| - Show the user the scores table and the final draft | ||
| Execute **all** of the following before evaluating the draft: | ||
|
|
||
| 6. Ask user to confirm: | ||
| - Show: proposed save path + scores table + final draft | ||
| - Wait for explicit confirmation before writing | ||
| - [ ] Grep `~/.claude/skills/` and relevant project `.claude/skills/` files by keyword to check for content overlap | ||
| - [ ] Check MEMORY.md (both project and global) for overlap | ||
| - [ ] Consider whether appending to an existing skill would suffice | ||
| - [ ] Confirm this is a reusable pattern, not a one-off fix | ||
|
|
||
| 7. Save to the determined location | ||
| ### 5b. Holistic verdict | ||
|
|
||
| ## Output Format for Step 5 (scores table) | ||
| Synthesize the checklist results and draft quality, then choose **one** of the following: | ||
|
|
||
| | Dimension | Score | Rationale | | ||
| |-----------|-------|-----------| | ||
| | Specificity | N/5 | ... | | ||
| | Actionability | N/5 | ... | | ||
| | Scope Fit | N/5 | ... | | ||
| | Non-redundancy | N/5 | ... | | ||
| | Coverage | N/5 | ... | | ||
| | **Total** | **N/25** | | | ||
| | Verdict | Meaning | Next Action | | ||
| |---------|---------|-------------| | ||
| | **Save** | Unique, specific, well-scoped | Proceed to Step 6 | | ||
| | **Improve then Save** | Valuable but needs refinement | List improvements → revise → re-evaluate (once) | | ||
| | **Absorb into [X]** | Should be appended to an existing skill | Show target skill and additions → Step 6 | | ||
| | **Drop** | Trivial, redundant, or too abstract | Explain reasoning and stop | | ||
|
|
||
| **Guideline dimensions** (informing the verdict, not scored): | ||
|
|
||
| - **Specificity & Actionability**: Contains code examples or commands that are immediately usable | ||
| - **Scope Fit**: Name, trigger conditions, and content are aligned and focused on a single pattern | ||
| - **Uniqueness**: Provides value not covered by existing skills (informed by checklist results) | ||
| - **Reusability**: Realistic trigger scenarios exist in future sessions | ||
|
|
||
| 6. **Verdict-specific confirmation flow** | ||
|
|
||
| - **Improve then Save**: Present the required improvements + revised draft + updated checklist/verdict after one re-evaluation; if the revised verdict is **Save**, save after user confirmation, otherwise follow the new verdict | ||
| - **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation | ||
| - **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation | ||
| - **Drop**: Show checklist results + reasoning only (no confirmation needed) | ||
|
|
||
| 7. Save / Absorb to the determined location | ||
|
|
||
|
Comment on lines
+90
to
+91
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Step 7 doesn't account for all verdicts. "Save / Absorb to the determined location" is incomplete and misleading:
Reword to clarify the conditional nature: 📝 Suggested rewording-7. Save / Absorb to the determined location
+7. **Execute the verdict action**
+ - **Save** / **Improve then Save**: Save new skill file to the location determined in Step 3
+ - **Absorb into [X]**: Append additions to the existing target skill file
+ - **Drop**: No file action required🤖 Prompt for AI Agents |
||
| ## Output Format for Step 5 | ||
|
|
||
| ``` | ||
| ### Checklist | ||
| - [x] skills/ grep: no overlap (or: overlap found → details) | ||
| - [x] MEMORY.md: no overlap (or: overlap found → details) | ||
| - [x] Existing skill append: new file appropriate (or: should append to [X]) | ||
| - [x] Reusability: confirmed (or: one-off → Drop) | ||
|
|
||
| ### Verdict: Save / Improve then Save / Absorb into [X] / Drop | ||
|
|
||
| **Rationale:** (1-2 sentences explaining the verdict) | ||
| ``` | ||
|
|
||
| ## Design Rationale | ||
|
|
||
| This version replaces the previous 5-dimension numeric scoring rubric (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage scored 1-5) with a checklist-based holistic verdict system. Modern frontier models (Opus 4.6+) have strong contextual judgment — forcing rich qualitative signals into numeric scores loses nuance and can produce misleading totals. The holistic approach lets the model weigh all factors naturally, producing more accurate save/drop decisions while the explicit checklist ensures no critical check is skipped. | ||
|
|
||
| ## Notes | ||
|
|
||
| - Don't extract trivial fixes (typos, simple syntax errors) | ||
| - Don't extract one-time issues (specific API outages, etc.) | ||
| - Focus on patterns that will save time in future sessions | ||
| - Keep skills focused — one pattern per skill | ||
| - If Coverage score is low, add related variants before saving | ||
| - When the verdict is Absorb, append to the existing skill rather than creating a new file | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.