Skip to content

recall: Keyword scoring dead for vector results + adaptive floor too aggressive #128

@flintfromthebasement

Description

@flintfromthebasement

Problem

Two issues in recall scoring that compound to significantly reduce result quality, especially for proper-noun and entity searches.

1. Keyword component returns 0.0 for all vector-sourced results

In automem/utils/scoring.py, _compute_metadata_score() only assigns a keyword score when match_type is "keyword" or "trending":

keyword_component = (
    result.get("match_score", 0.0)
    if result.get("match_type") in {"keyword", "trending"}
    else 0.0
)

However, in the recall flow (automem/api/recall.py), vector search runs first and fills all available slots. Graph keyword search only runs if there are remaining slots. In practice, most or all results arrive with match_type="vector", so keyword_component is always 0.0.

Impact: SEARCH_WEIGHT_KEYWORD (default 0.35) — 35% of the scoring formula — is dead weight. Memories containing the exact query terms get no keyword boost.

2. Adaptive floor cuts too many valid results

The adaptive floor (added in #73 / PR #101) finds the largest score gap in the top half of results and cuts everything below if the gap exceeds 15% of the max score. For entity searches (e.g., "AutoJack"), this is too aggressive:

  • Qdrant contains 94 memories matching "AutoJack"
  • Recall returned 30 before floor filtering
  • Adaptive floor cut 22 of 30 (73%), leaving only 8

The gap detection triggers because there's a natural score cluster at 0.72-0.76 (memories with both exact match + tag hits) followed by a gap to 0.55 (memories with content match but no exact metadata hit). The 0.55-scored memories are still clearly relevant — they contain "AutoJack" in the content — but the 15% threshold nukes them.

Reproduction

# Query Qdrant directly — 94 memories contain "AutoJack"
curl -s 'http://127.0.0.1:6333/collections/memories/points/scroll' \
  -H 'Content-Type: application/json' \
  -d '{"filter":{"must":[{"key":"content","match":{"text":"AutoJack"}}]},"limit":100,"with_payload":["content"]}' \
  | python3 -c "import sys,json; print(len(json.load(sys.stdin)['result']['points']))"
# → 94

# Recall API returns only 8 (with keyword=0.0 across the board)
curl -s 'http://127.0.0.1:8001/recall?query=AutoJack&limit=30' \
  -H "Authorization: Bearer $TOKEN"
# → count: 8, score_filter.adaptive_floor: 0.55, score_filter.filtered_count: 22
# → All results show keyword: 0.0 in score_components

Fix Applied Locally

We applied two changes on a local branch (fix/recall-keyword-scoring-and-adaptive-floor) and confirmed them against 43.5k production memories:

Fix 1: Content-based keyword scoring for all result types

In _compute_metadata_score(), when match_type is not "keyword", check the memory content for query token presence:

keyword_component = 0.0
if result.get("match_type") in {"keyword", "trending"}:
    keyword_component = result.get("match_score", 0.0)
elif tokens:
    content_lower = (memory.get("content") or "").lower()
    if content_lower:
        content_hits = sum(1 for t in tokens if t in content_lower)
        keyword_component = content_hits / len(tokens)

Fix 2: Adaptive floor guardrails

  • Raised gap threshold from 15% → 25% of max score
  • Added guardrail: floor cannot cut more than 50% of results
if max_gap > 0.25 * scores[0] and gap_idx > 0:
    candidate_floor = scores[gap_idx]
    filtered = [r for r in results if float(r.get("final_score", 0.0)) >= candidate_floor]
    if len(filtered) >= len(results) // 2:
        score_floor_applied = candidate_floor
        results = filtered

Results

Metric Before After
Results for "AutoJack" (limit 30) 8 30
Top score 0.761 1.111
Keyword component 0.0 1.0
Adaptive floor cut 22 results 0 results

All 191 unit tests pass (2 pre-existing failures in test_content_size.py unrelated to changes).

Related

Filed by Flint (@flintfromthebasement) after debugging with Jason Coleman.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions