Skip to content

docs: fix drift in Hybrid Search#165

Merged
jack-arturo merged 1 commit into
mainfrom
docs/audit-hybrid-search-20260608
Jun 8, 2026
Merged

docs: fix drift in Hybrid Search#165
jack-arturo merged 1 commit into
mainfrom
docs/audit-hybrid-search-20260608

Conversation

@jack-arturo

Copy link
Copy Markdown
Member

Claim → current state → fix

# Claim in docs Current state in code Fix
1 Entity expansion step 2: "extract entities using extract_entities()" extract_entities() does not exist anywhere in the codebase. The recall pipeline calls _extract_entities_from_results() (automem/api/recall.py) which reads existing entity:* tags from seed-result metadata — it does not run a fresh NER pass Replaced extract_entities() with _extract_entities_from_results() and clarified that it reads existing entity tags
2 Entity expansion step 3: "Convert extracted entities to entity:<type>:<slug> tags" Entity names are already stored as entity:* tags; the function extracts the names from those tags and then converts them to search tag patterns (e.g. "Sarah"entity:person:sarah); the old wording implied fresh tag creation Updated to "Convert extracted entity names to entity:<type>:<slug> search tags"

Verified against: automem@ed36b98e3e1569dde71aa430417b6549520f7068 (automem/api/recall.py_extract_entities_from_results(), _expand_entity_memories())


Generated by Claude Code

Entity expansion step 2 referenced `extract_entities()` which does not
exist. The actual function called by the recall pipeline is
`_extract_entities_from_results()` (automem/api/recall.py), which reads
existing `entity:*` tags from seed-result metadata rather than running
a fresh NER pass; step 3 description updated accordingly.

Verified against: automem@ed36b98e3e1569dde71aa430417b6549520f7068
Copilot AI review requested due to automatic review settings June 8, 2026 23:15
@jack-arturo jack-arturo enabled auto-merge June 8, 2026 23:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Hybrid Search documentation to accurately describe how entity expansion works in the recall pipeline, aligning the narrative with the implementation details referenced in the PR description.

Changes:

  • Replaces the non-existent extract_entities() reference with _extract_entities_from_results() and clarifies it reads existing entity:* tags.
  • Clarifies that extracted entity names are converted into entity:<type>:<slug> search tags (not newly created entity tags).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +327 to +328
2. For each seed result, extract entity names from existing `entity:*` tags using `_extract_entities_from_results()`
3. Convert extracted entity names to `entity:<type>:<slug>` search tags

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_entities() does exist — it's defined in automem/utils/entity_extraction.py and is used by the enrichment pipeline to run spaCy NLP on newly stored memory content. The references in overview.md (rows/diagram for "Enrichment Pipeline") and architecture/enrichment.md (enrichment worker sequence) describe that flow correctly and need no change.

This PR fixes a different path: entity expansion during recall. When a user calls /recall?expand_entities=true, the recall pipeline reads back entity tags already stored on memories (created earlier by the enrichment step) via _extract_entities_from_results() — it does not call the NLP extractor again. The old extract_entities() reference in hybrid-search.md was conflating these two operations.


Generated by Claude Code

@jack-arturo jack-arturo added this pull request to the merge queue Jun 8, 2026
@cloudflare-workers-and-pages

Copy link
Copy Markdown

Deploying automem-website with  Cloudflare Pages  Cloudflare Pages

Latest commit: 6425111
Status: ✅  Deploy successful!
Preview URL: https://962fe5cb.automem-website.pages.dev
Branch Preview URL: https://docs-audit-hybrid-search-202-sgyh.automem-website.pages.dev

View logs

Merged via the queue into main with commit 24024d5 Jun 8, 2026
5 checks passed
@jack-arturo jack-arturo deleted the docs/audit-hybrid-search-20260608 branch June 8, 2026 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants