feat: add instruction compliance measurement and skip-rate dashboard#606
Merged
feat: add instruction compliance measurement and skip-rate dashboard#606
Conversation
User no longer uses Opus 4.7. Removes model-specific override framing while preserving the underlying instructions (verification-means-execution, mandatory delegation, parallel dispatch, task specification).
PostToolUse hook checks M01-M06 instruction compliance via string-presence patterns and records observations to learning.db. New `skip-rate` command on learning-db.py CLI displays compliance dashboard with 20%/30-obs gate threshold. 28 tests cover detection, recording, integration, and CLI output.
… overwrites record_learning() UPSERTs on (topic, key), so each compliance observation overwrote the previous one — skip-rate dashboard showed incorrect data. Added instruction_compliance table (schema v4) with INSERT-only semantics. Hook now writes individual observations that accumulate instead of replacing. Skip-rate command queries the new table with proper GROUP BY aggregation.
- Anchor M03 `===` pattern to standalone lines (^={3,}\s*$) to prevent
false positives on JS strict equality and inline markdown separators
- Anchor ROUTING: pattern to require start-of-line or whitespace prefix
- Add M04 patterns: "Before starting work", "Load EVERY reference file"
- Add M05 patterns: partial prefix matches for prompt-injected markers
- Batch all 5 compliance inserts into one transaction via executemany
(record_instruction_compliance_batch in learning_db_v2)
- Add tests: M03 false positive guards, M04 new variants, batch recording
The test_batch_recording test failed in CI because the hook's imported learning_db_v2 module retained _initialized=True from other tests, skipping table creation. Reset across all sys.modules references.
The test relied on record_instruction_compliance_batch calling init_db() internally, but in CI the module's _initialized flag was stale from a different connection context. Call init_db() explicitly before the test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
hooks/instruction-compliance.py— PostToolUse hook recording M01-M06 compliance to learning.dbinstruction_compliancetable (INSERT per observation, not UPSERT)skip-ratecommand to learning-db.py — formatted dashboard with >20% threshold flaggingadr/instruction-skip-rate-measurement.mdTest plan