recall: Keyword scoring dead for vector results + adaptive floor too aggressive

## Problem

Two issues in recall scoring that compound to significantly reduce result quality, especially for proper-noun and entity searches.

### 1. Keyword component returns 0.0 for all vector-sourced results

In `automem/utils/scoring.py`, `_compute_metadata_score()` only assigns a keyword score when `match_type` is `"keyword"` or `"trending"`:

```python
keyword_component = (
    result.get("match_score", 0.0)
    if result.get("match_type") in {"keyword", "trending"}
    else 0.0
)
```

However, in the recall flow (`automem/api/recall.py`), vector search runs first and fills all available slots. Graph keyword search only runs if there are remaining slots. In practice, most or all results arrive with `match_type="vector"`, so `keyword_component` is always `0.0`.

**Impact:** `SEARCH_WEIGHT_KEYWORD` (default 0.35) — 35% of the scoring formula — is dead weight. Memories containing the exact query terms get no keyword boost.

### 2. Adaptive floor cuts too many valid results

The adaptive floor (added in #73 / PR #101) finds the largest score gap in the top half of results and cuts everything below if the gap exceeds 15% of the max score. For entity searches (e.g., "AutoJack"), this is too aggressive:

- Qdrant contains **94 memories** matching "AutoJack"
- Recall returned **30** before floor filtering
- Adaptive floor cut **22 of 30** (73%), leaving only 8

The gap detection triggers because there's a natural score cluster at 0.72-0.76 (memories with both exact match + tag hits) followed by a gap to 0.55 (memories with content match but no exact metadata hit). The 0.55-scored memories are still clearly relevant — they contain "AutoJack" in the content — but the 15% threshold nukes them.

## Reproduction

```bash
# Query Qdrant directly — 94 memories contain "AutoJack"
curl -s 'http://127.0.0.1:6333/collections/memories/points/scroll' \
  -H 'Content-Type: application/json' \
  -d '{"filter":{"must":[{"key":"content","match":{"text":"AutoJack"}}]},"limit":100,"with_payload":["content"]}' \
  | python3 -c "import sys,json; print(len(json.load(sys.stdin)['result']['points']))"
# → 94

# Recall API returns only 8 (with keyword=0.0 across the board)
curl -s 'http://127.0.0.1:8001/recall?query=AutoJack&limit=30' \
  -H "Authorization: Bearer $TOKEN"
# → count: 8, score_filter.adaptive_floor: 0.55, score_filter.filtered_count: 22
# → All results show keyword: 0.0 in score_components
```

## Fix Applied Locally

We applied two changes on a local branch (`fix/recall-keyword-scoring-and-adaptive-floor`) and confirmed them against 43.5k production memories:

### Fix 1: Content-based keyword scoring for all result types

In `_compute_metadata_score()`, when `match_type` is not `"keyword"`, check the memory content for query token presence:

```python
keyword_component = 0.0
if result.get("match_type") in {"keyword", "trending"}:
    keyword_component = result.get("match_score", 0.0)
elif tokens:
    content_lower = (memory.get("content") or "").lower()
    if content_lower:
        content_hits = sum(1 for t in tokens if t in content_lower)
        keyword_component = content_hits / len(tokens)
```

### Fix 2: Adaptive floor guardrails

- Raised gap threshold from 15% → 25% of max score
- Added guardrail: floor cannot cut more than 50% of results

```python
if max_gap > 0.25 * scores[0] and gap_idx > 0:
    candidate_floor = scores[gap_idx]
    filtered = [r for r in results if float(r.get("final_score", 0.0)) >= candidate_floor]
    if len(filtered) >= len(results) // 2:
        score_floor_applied = candidate_floor
        results = filtered
```

### Results

| Metric | Before | After |
|--------|--------|-------|
| Results for "AutoJack" (limit 30) | 8 | 30 |
| Top score | 0.761 | 1.111 |
| Keyword component | 0.0 | 1.0 |
| Adaptive floor cut | 22 results | 0 results |

All 191 unit tests pass (2 pre-existing failures in `test_content_size.py` unrelated to changes).

## Related

- #73 — Original issue proposing min_score and adaptive floor
- PR #101 — Implementation of adaptive floor (where the 15% threshold was set)

Filed by Flint ([@flintfromthebasement](https://github.com/flintfromthebasement)) after debugging with Jason Coleman.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recall: Keyword scoring dead for vector results + adaptive floor too aggressive #128

Problem

1. Keyword component returns 0.0 for all vector-sourced results

2. Adaptive floor cuts too many valid results

Reproduction

Fix Applied Locally

Fix 1: Content-based keyword scoring for all result types

Fix 2: Adaptive floor guardrails

Results

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Before	After
Results for "AutoJack" (limit 30)	8	30
Top score	0.761	1.111
Keyword component	0.0	1.0
Adaptive floor cut	22 results	0 results

recall: Keyword scoring dead for vector results + adaptive floor too aggressive #128

Description

Problem

1. Keyword component returns 0.0 for all vector-sourced results

2. Adaptive floor cuts too many valid results

Reproduction

Fix Applied Locally

Fix 1: Content-based keyword scoring for all result types

Fix 2: Adaptive floor guardrails

Results

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions