docs: fix drift in Hybrid Search#165
Conversation
Entity expansion step 2 referenced `extract_entities()` which does not exist. The actual function called by the recall pipeline is `_extract_entities_from_results()` (automem/api/recall.py), which reads existing `entity:*` tags from seed-result metadata rather than running a fresh NER pass; step 3 description updated accordingly. Verified against: automem@ed36b98e3e1569dde71aa430417b6549520f7068
There was a problem hiding this comment.
Pull request overview
Updates the Hybrid Search documentation to accurately describe how entity expansion works in the recall pipeline, aligning the narrative with the implementation details referenced in the PR description.
Changes:
- Replaces the non-existent
extract_entities()reference with_extract_entities_from_results()and clarifies it reads existingentity:*tags. - Clarifies that extracted entity names are converted into
entity:<type>:<slug>search tags (not newly created entity tags).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 2. For each seed result, extract entity names from existing `entity:*` tags using `_extract_entities_from_results()` | ||
| 3. Convert extracted entity names to `entity:<type>:<slug>` search tags |
There was a problem hiding this comment.
extract_entities() does exist — it's defined in automem/utils/entity_extraction.py and is used by the enrichment pipeline to run spaCy NLP on newly stored memory content. The references in overview.md (rows/diagram for "Enrichment Pipeline") and architecture/enrichment.md (enrichment worker sequence) describe that flow correctly and need no change.
This PR fixes a different path: entity expansion during recall. When a user calls /recall?expand_entities=true, the recall pipeline reads back entity tags already stored on memories (created earlier by the enrichment step) via _extract_entities_from_results() — it does not call the NLP extractor again. The old extract_entities() reference in hybrid-search.md was conflating these two operations.
Generated by Claude Code
Deploying automem-website with
|
| Latest commit: |
6425111
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://962fe5cb.automem-website.pages.dev |
| Branch Preview URL: | https://docs-audit-hybrid-search-202-sgyh.automem-website.pages.dev |
Claim → current state → fix
extract_entities()"extract_entities()does not exist anywhere in the codebase. The recall pipeline calls_extract_entities_from_results()(automem/api/recall.py) which reads existingentity:*tags from seed-result metadata — it does not run a fresh NER passextract_entities()with_extract_entities_from_results()and clarified that it reads existing entity tagsentity:<type>:<slug>tags"entity:*tags; the function extracts the names from those tags and then converts them to search tag patterns (e.g."Sarah"→entity:person:sarah); the old wording implied fresh tag creationentity:<type>:<slug>search tags"Verified against: automem@ed36b98e3e1569dde71aa430417b6549520f7068 (
automem/api/recall.py—_extract_entities_from_results(),_expand_entity_memories())Generated by Claude Code