-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
feat(commands): improve learn-eval with checklist-based holistic verdict #360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
shimo4228
wants to merge
2
commits into
affaan-m:main
Choose a base branch
from
shimo4228:feat/commands/learn-eval-v2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,10 @@ | ||
| --- | ||
| description: Extract reusable patterns from the session, self-evaluate quality before saving, and determine the right save location (Global vs Project). | ||
| description: "Extract reusable patterns from the session, self-evaluate quality before saving, and determine the right save location (Global vs Project)." | ||
| --- | ||
|
|
||
| # /learn-eval - Extract, Evaluate, then Save | ||
|
|
||
| Extends `/learn` with a quality gate and save-location decision before writing any skill file. | ||
| Extends `/learn` with a quality gate, save-location decision, and knowledge-placement awareness before writing any skill file. | ||
|
|
||
| ## What to Extract | ||
|
|
||
|
|
@@ -51,41 +51,65 @@ origin: auto-extracted | |
| [Trigger conditions] | ||
| ``` | ||
|
|
||
| 5. **Self-evaluate before saving** using this rubric: | ||
| 5. **Quality gate — Checklist + Holistic verdict** | ||
|
|
||
| | Dimension | 1 | 3 | 5 | | ||
| |-----------|---|---|---| | ||
| | Specificity | Abstract principles only, no code examples | Representative code example present | Rich examples covering all usage patterns | | ||
| | Actionability | Unclear what to do | Main steps are understandable | Immediately actionable, edge cases covered | | ||
| | Scope Fit | Too broad or too narrow | Mostly appropriate, some boundary ambiguity | Name, trigger, and content perfectly aligned | | ||
| | Non-redundancy | Nearly identical to another skill | Some overlap but unique perspective exists | Completely unique value | | ||
| | Coverage | Covers only a fraction of the target task | Main cases covered, common variants missing | Main cases, edge cases, and pitfalls covered | | ||
| ### 5a. Required checklist (verify by actually reading files) | ||
|
|
||
| - Score each dimension 1–5 | ||
| - If any dimension scores 1–2, improve the draft and re-score until all dimensions are ≥ 3 | ||
| - Show the user the scores table and the final draft | ||
| Execute **all** of the following before evaluating the draft: | ||
|
|
||
| 6. Ask user to confirm: | ||
| - Show: proposed save path + scores table + final draft | ||
| - Wait for explicit confirmation before writing | ||
| - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap | ||
| - [ ] Check MEMORY.md (both project and global) for overlap | ||
| - [ ] Consider whether appending to an existing skill would suffice | ||
| - [ ] Confirm this is a reusable pattern, not a one-off fix | ||
|
|
||
| 7. Save to the determined location | ||
| ### 5b. Holistic verdict | ||
|
|
||
| ## Output Format for Step 5 (scores table) | ||
| Synthesize the checklist results and draft quality, then choose **one** of the following: | ||
|
|
||
| | Dimension | Score | Rationale | | ||
| |-----------|-------|-----------| | ||
| | Specificity | N/5 | ... | | ||
| | Actionability | N/5 | ... | | ||
| | Scope Fit | N/5 | ... | | ||
| | Non-redundancy | N/5 | ... | | ||
| | Coverage | N/5 | ... | | ||
| | **Total** | **N/25** | | | ||
| | Verdict | Meaning | Next Action | | ||
| |---------|---------|-------------| | ||
| | **Save** | Unique, specific, well-scoped | Proceed to Step 6 | | ||
| | **Improve then Save** | Valuable but needs refinement | List improvements → revise → re-evaluate (once) | | ||
| | **Absorb into [X]** | Should be appended to an existing skill | Show target skill and additions → Step 6 | | ||
| | **Drop** | Trivial, redundant, or too abstract | Explain reasoning and stop | | ||
|
|
||
| **Guideline dimensions** (informing the verdict, not scored): | ||
|
|
||
| - **Specificity & Actionability**: Contains code examples or commands that are immediately usable | ||
| - **Scope Fit**: Name, trigger conditions, and content are aligned and focused on a single pattern | ||
| - **Uniqueness**: Provides value not covered by existing skills (informed by checklist results) | ||
| - **Reusability**: Realistic trigger scenarios exist in future sessions | ||
|
|
||
| 6. **Verdict-specific confirmation flow** | ||
|
|
||
| - **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation | ||
| - **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation | ||
| - **Drop**: Show checklist results + reasoning only (no confirmation needed) | ||
|
|
||
| 7. Save / Absorb to the determined location | ||
|
Comment on lines
+83
to
+89
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add the Step 5b introduces four verdicts, but Step 6 only specifies confirmation behavior for three. Without an explicit post-revision path, the command can end up ambiguous after the one allowed re-evaluation. Suggested wording 6. **Verdict-specific confirmation flow**
+ - **Improve then Save**: Present the required improvements + revised draft + updated checklist/verdict after one re-evaluation; if the revised verdict is **Save**, save after user confirmation, otherwise follow the new verdict
- **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation
- **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation
- **Drop**: Show checklist results + reasoning only (no confirmation needed)🤖 Prompt for AI Agents |
||
|
|
||
| ## Output Format for Step 5 | ||
|
|
||
| ``` | ||
| ### Checklist | ||
| - [x] skills/ grep: no overlap (or: overlap found → details) | ||
| - [x] MEMORY.md: no overlap (or: overlap found → details) | ||
| - [x] Existing skill append: new file appropriate (or: should append to [X]) | ||
| - [x] Reusability: confirmed (or: one-off → Drop) | ||
|
|
||
| ### Verdict: Save / Improve then Save / Absorb into [X] / Drop | ||
|
|
||
| **Rationale:** (1-2 sentences explaining the verdict) | ||
| ``` | ||
|
|
||
| ## Design Rationale | ||
|
|
||
| This version replaces the previous 5-dimension numeric scoring rubric (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage scored 1-5) with a checklist-based holistic verdict system. Modern frontier models (Opus 4.6+) have strong contextual judgment — forcing rich qualitative signals into numeric scores loses nuance and can produce misleading totals. The holistic approach lets the model weigh all factors naturally, producing more accurate save/drop decisions while the explicit checklist ensures no critical check is skipped. | ||
|
|
||
| ## Notes | ||
|
|
||
| - Don't extract trivial fixes (typos, simple syntax errors) | ||
| - Don't extract one-time issues (specific API outages, etc.) | ||
| - Focus on patterns that will save time in future sessions | ||
| - Keep skills focused — one pattern per skill | ||
| - If Coverage score is low, add related variants before saving | ||
| - When the verdict is Absorb, append to the existing skill rather than creating a new file | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Checklist grep only checks global skills (
~/.claude/skills/), missing project-level skills (.claude/skills/). Since step 3 defines both as valid save locations, duplicates in project skills would go undetected. The MEMORY.md item already says "both project and global" — apply the same pattern here.Prompt for AI agents