Context
During the #72 production repair rollout (2026-06-10), migrate_entity_nodes.py --dry-run against the repaired corpus accepted 1,591 entities: people 1,020, organizations 400, tools 118, projects 26, concepts 22, opportunities 5 (5,535 held back by the existing migration-review heuristics; 3,026 edges).
Migration was deferred at the rollout's final gate because the people category is still noisy:
- 734 of 1,020 accepted people have exactly one reference, and a random sample of that bucket is ~80% word-pair noise with person shape:
anker-power, cache-performance, tcp-proxy, framer-motion, keep-stripe, server-vad, mac-catalyst, …
- Even refs≥2 leaks:
google-calendar (8), sri-lanka (9), raspberry-pi (7), mile-method (7), loading-suredash (7), jack-s (8).
Proposed work
- Migration gates: add
--min-references N and --categories a,b,c to scripts/migrate_entity_nodes.py (it currently has only --dry-run) so a clean slice (non-people + high-reference people) can migrate first.
- Word-pair people validation pass: structural detection of common-noun pairs that pass
_has_person_name_shape, or the cheap-LLM review path for ambiguous people slugs (per the no-corpus-specific-fixtures direction).
- Verb-person fragments survive repair as tags (
needs-jack, jack-mentions, demonstrates-steve class) — leading verbs missing from _ACTION_PREFIXES; the same pass should handle them.
- 8 noise canonical targets consolidated by canonicalize-safe (
solana-agent, locomo-benchmark, rag-implementation, decap-cms, loading-suredash, mile-method, watapana-tattoo, +1) — single slugs now, easy to remove once classified.
Full dry-run log: automem-evals data/sweep_runs/prod-rollout-20260610/migration-dry-run.log.
Refs #72, #124, #176, #178, #179.
🤖 Generated with Claude Code
Context
During the #72 production repair rollout (2026-06-10),
migrate_entity_nodes.py --dry-runagainst the repaired corpus accepted 1,591 entities: people 1,020, organizations 400, tools 118, projects 26, concepts 22, opportunities 5 (5,535 held back by the existing migration-review heuristics; 3,026 edges).Migration was deferred at the rollout's final gate because the people category is still noisy:
anker-power,cache-performance,tcp-proxy,framer-motion,keep-stripe,server-vad,mac-catalyst, …google-calendar(8),sri-lanka(9),raspberry-pi(7),mile-method(7),loading-suredash(7),jack-s(8).Proposed work
--min-references Nand--categories a,b,ctoscripts/migrate_entity_nodes.py(it currently has only--dry-run) so a clean slice (non-people + high-reference people) can migrate first._has_person_name_shape, or the cheap-LLM review path for ambiguous people slugs (per the no-corpus-specific-fixtures direction).needs-jack,jack-mentions,demonstrates-steveclass) — leading verbs missing from_ACTION_PREFIXES; the same pass should handle them.solana-agent,locomo-benchmark,rag-implementation,decap-cms,loading-suredash,mile-method,watapana-tattoo, +1) — single slugs now, easy to remove once classified.Full dry-run log: automem-evals
data/sweep_runs/prod-rollout-20260610/migration-dry-run.log.Refs #72, #124, #176, #178, #179.
🤖 Generated with Claude Code