fix(entity): stop validator over-rejecting real people, code tools, and event categories by jack-arturo · Pull Request #178 · verygoodplugins/automem

jack-arturo · 2026-06-10T21:15:38Z

Summary

Production Gate-3 review of the entity-tag repair rollout (the human review of rejected-tags.csv before executing repair_entity_tags.py against production) surfaced three over-rejection classes in the deployed validator. Of 7,677 planned tag removals, ~1,600 were legitimate entities.

What was wrong

Real people rejected by the context-hint branch. _looks_tool_or_org_like rejected any person whose memory content contained generic words (data, project, platform, tool...). In an engineering corpus that's nearly every memory — 725 distinct multi-token person names were condemned, including the corpus owner's own canonical entity.
code as a primary fragment token rejected real tool entities: claude-code (80 occurrences), vs-code.
events and opportunities categories missing from _CATEGORY_ALIASES — every such tag was dropped as unknown_category.

The fix

Person-shaped multi-token slugs skip the context-hint branch. CamelCase and tool/org suffix signals still apply (growthmath-style names are still rejected); single-token brand-like people (automem, claude) are still rejected contextually.
code demoted to _MARKDOWN_OR_CODE_SECONDARY_TOKENS (needs a second code-ish token to reject). People slugs containing code and path/markdown fragments remain rejected via existing checks.
events/opportunities (+ singular aliases) added to the category map.
Backstop for noise the context branch used to catch: _NON_PERSON_COMMON_TOKENS (bottom-line / deck-today / email-highlights / claude-desktop class) and pipeline added to _NON_PERSON_TECH_TOKENS.

Empirical validation on a production clone (10,061 memories)

	deployed validator	this PR
planned rejections	7,677	6,106
freed (all person/tool/event entities)	—	1,603 across 743 distinct tags
newly caught (all generated noise)	—	32

Every freed tag inspected by category: person names, claude-code/vs-code, entity:events:*, entity:opportunities:*. Every newly-rejected tag is generated noise (good-plugins, chrome, claude-desktop, stream-deck-as-person).

This also stops enrichment-time over-stripping: the same validator gates _validated_entities(), so new memories were silently losing these entities on every store since #176 deployed.

Test Plan

pytest tests/ → 487 passed, 12 skipped (env-dependent)
make lint → clean
New tests: person names in technical context, code-suffixed tools, event/opportunity categories, common-word-pair people noise — all using synthetic names per the no-real-fixtures rule

Refs #72. Part of the staged production entity repair (Gate 3 of the rollout runbook).

🤖 Generated with Claude Code

…nd event categories Production Gate-3 review of the entity-tag repair plan surfaced three over-rejection classes in the deployed validator (~1,600 of 7,677 planned removals were legitimate entities): - The _TOOL_OR_ORG_CONTEXT_HINTS branch rejected any person whose memory mentioned generic words like data/project/platform/tooling — in an engineering corpus that is nearly every memory, so 725 distinct multi-token person names were condemned. Person-shaped multi-token slugs now skip the context branch; camelCase and tool/org suffix signals still apply, and single-token brand-like people are still rejected contextually. - "code" as a primary markdown/code-fragment token rejected real tool names (claude-code, vs-code, code-server). It is now a secondary signal; people slugs containing "code" and path/markdown fragments remain rejected. - events and opportunities categories were missing from _CATEGORY_ALIASES, so every such tag was dropped as unknown_category. Backstop: a small common-word token set (bottom-line, deck-today, email-highlights, claude-desktop class) plus "pipeline" in the tech tokens keeps the generated-noise people slugs out now that the context branch no longer catches them. Empirical diff on a production clone (10,061 memories): rejections 7,677 -> 6,106; all 1,603 freed rejections are person/tool/event entities; all 32 newly-caught rejections are generated noise. Refs #72. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot

Pull request overview

This PR adjusts the entity-quality validator to reduce false rejections discovered during the production “Gate-3” review of the entity-tag repair rollout, specifically for real people names in technical context, tool names containing code, and missing events / opportunities categories.

Changes:

Add event(s) and opportunity/opportunities to the entity category alias map so they validate instead of being rejected as unknown_category.
Refine the “markdown/code fragment” heuristic by demoting code to a secondary token (so *-code tools aren’t rejected solely due to code).
Reduce context-hint over-rejection for multi-token person names and add a backstop token set for common non-name word pairs; add tests covering these cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`automem/utils/entity_quality.py`	Updates category normalization and validation heuristics (people/tool/code/category handling).
`tests/test_entity_quality.py`	Adds regression tests for multi-token people-in-tech-context, code-suffixed tools, new categories, and common-word-pair noise.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…th (#179) ## Why PR #178's applied review suggestion (commit e6876b0, "Potential fix for pull request finding") gated the person-shape exemption on the display value containing a space: ```python if " " in (value or "").strip() and len(parts) >= 2 and _has_person_name_shape(parts): ``` Stored entity tags only retain the slug (`entity:people:jack-arturo`), so `validate_entity_tag(context=...)` — the path `scripts/lab/repair_entity_tags.py` uses — never satisfies the guard, and the context-hint branch re-rejects every real person mentioned alongside data/projects/tooling. **Empirical impact (prod dry-run, read-only):** 7,494 planned rejections with the guard vs ~6,106 expected with the exemption — ~1,390 legitimate person tags (jack-arturo ×51, zack-katz ×27, jason-coleman ×25, katie-keith ×14, ...) wrongly planned for removal. CI stayed green because every existing test exercised the spaced-value path (`validate_entity_value("people", "Mara Quinn", ...)`), never the slug path with context. ## What - Restore the plain person-shape exemption (no space guard) in `_looks_tool_or_org_like`. - Address the original Copilot concern (`entity:people:data-dog`) deterministically: add `"data"` to `_NON_PERSON_TECH_TOKENS`, so brand-like person-shaped pairs are rejected by the per-token vocabulary check on **every** path (with or without context) — strictly stronger than the context-hint rejection the guard tried to preserve. - Regression tests for the slug path: real-person tags survive technical context via `validate_entity_tag`; `data-dog` is rejected with `low_signal_people_slug` without needing context. ## Verification - `pytest tests/`: 490 passed, 12 skipped - Spot checks: `jack-arturo`/`zack-katz`/`jason-coleman`/`katie-keith` accepted with technical context on the tag path; `data-dog`, `growthmath`, `claude-code`-as-people still rejected Part of the issue #72 production repair rollout (follow-up to #178). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 10, 2026 21:15

Copilot started reviewing on behalf of jack-arturo June 10, 2026 21:15 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread automem/utils/entity_quality.py

Potential fix for pull request finding

e6876b0

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

jack-arturo merged commit 193b730 into main Jun 10, 2026
7 checks passed

jack-arturo deleted the fix/entity-validator-overrejection branch June 10, 2026 21:24

This was referenced Jun 10, 2026

chore(main): release 0.16.0 #154

Open

fix(entity): restore person-shape exemption on the slug validation path #179

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(entity): stop validator over-rejecting real people, code tools, and event categories#178

fix(entity): stop validator over-rejecting real people, code tools, and event categories#178
jack-arturo merged 2 commits into
mainfrom
fix/entity-validator-overrejection

jack-arturo commented Jun 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jack-arturo commented Jun 10, 2026

Summary

What was wrong

The fix

Empirical validation on a production clone (10,061 memories)

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants