Skip to content

feat: add instruction compliance measurement and skip-rate dashboard#606

Merged
notque merged 6 commits intomainfrom
feat/instruction-skip-rate
May 8, 2026
Merged

feat: add instruction compliance measurement and skip-rate dashboard#606
notque merged 6 commits intomainfrom
feat/instruction-skip-rate

Conversation

@notque
Copy link
Copy Markdown
Owner

@notque notque commented May 8, 2026

Summary

  • Adds hooks/instruction-compliance.py — PostToolUse hook recording M01-M06 compliance to learning.db
  • Adds dedicated instruction_compliance table (INSERT per observation, not UPSERT)
  • Adds skip-rate command to learning-db.py — formatted dashboard with >20% threshold flagging
  • Bug found during integration testing: original record_learning UPSERT overwrote observations — fixed with per-row table
  • ADR: adr/instruction-skip-rate-measurement.md

Test plan

  • 30 unit tests (detection, recording, accumulation, integration, dashboard)
  • Integration: real hook invocation, verified 40% skip-rate calculation (3 compliant + 2 non-compliant)
  • Collect data over 50+ sessions to identify high-skip instructions (post-merge)

notque added 6 commits May 8, 2026 14:03
User no longer uses Opus 4.7. Removes model-specific override framing
while preserving the underlying instructions (verification-means-execution,
mandatory delegation, parallel dispatch, task specification).
PostToolUse hook checks M01-M06 instruction compliance via string-presence
patterns and records observations to learning.db. New `skip-rate` command
on learning-db.py CLI displays compliance dashboard with 20%/30-obs gate
threshold. 28 tests cover detection, recording, integration, and CLI output.
… overwrites

record_learning() UPSERTs on (topic, key), so each compliance observation
overwrote the previous one — skip-rate dashboard showed incorrect data.

Added instruction_compliance table (schema v4) with INSERT-only semantics.
Hook now writes individual observations that accumulate instead of replacing.
Skip-rate command queries the new table with proper GROUP BY aggregation.
- Anchor M03 `===` pattern to standalone lines (^={3,}\s*$) to prevent
  false positives on JS strict equality and inline markdown separators
- Anchor ROUTING: pattern to require start-of-line or whitespace prefix
- Add M04 patterns: "Before starting work", "Load EVERY reference file"
- Add M05 patterns: partial prefix matches for prompt-injected markers
- Batch all 5 compliance inserts into one transaction via executemany
  (record_instruction_compliance_batch in learning_db_v2)
- Add tests: M03 false positive guards, M04 new variants, batch recording
The test_batch_recording test failed in CI because the hook's imported
learning_db_v2 module retained _initialized=True from other tests, skipping
table creation. Reset across all sys.modules references.
The test relied on record_instruction_compliance_batch calling init_db()
internally, but in CI the module's _initialized flag was stale from a
different connection context. Call init_db() explicitly before the test.
@notque notque merged commit 5582ee6 into main May 8, 2026
5 checks passed
@notque notque deleted the feat/instruction-skip-rate branch May 8, 2026 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant