Template refresh for Opus 4.7 / 1M context + 2 server bugs surfaced during validation

## TL;DR

While validating AutoMem recall behavior for Opus 4.7 / 1M context across ~15 experiments against production (`https://automem.up.railway.app`), three items surfaced:

1. **Bug:** A `store_memory` call silently failed to persist — memory never appeared in any subsequent recall (content search, tag exact match, tag prefix, vector similarity).
2. **Bug:** `context_tags` + small `limit` drops higher-scoring boosted results in favor of lower-raw-score vector matches.
3. **Recommendation:** Template updates for 1M-context sessions, with measurements showing which knobs actually move the needle.

Also a docs inconsistency worth flagging (#C below).

---

## Context

Validation was done end-to-end via the MCP client and via direct HTTP `GET /recall` against production. Corpus size during testing: 9,401 memories, both Qdrant and FalkorDB reporting healthy and sync'd. All tests read-only.

---

## A. Silent store failure — `pref/test` memory never persisted

**Severity:** Medium. Data integrity. Store calls returning success while not persisting is the worst failure mode.

**What happened:**
A `store_memory` call was made yesterday with:
```
content: "Test pref: multi-phase recall runs on restart"
type:    "Preference"
tags:    ["pref/test", "project/mcp-automem", "source/manual"]
importance: 0.5
```

MCP call returned success. Memory cannot be found through any recall path:

```bash
# Expected at least 1 result. Got 0:
curl -G "$ENDPOINT/recall" -H "X-Api-Key: $KEY" \
  --data-urlencode 'tags=pref/test' \
  --data-urlencode 'tag_match=exact' \
  --data-urlencode 'limit=10'
# → {"count": 0, ...}

# Content search — exact phrase from the memory content:
curl -G "$ENDPOINT/recall" -H "X-Api-Key: $KEY" \
  --data-urlencode 'query=multi-phase recall runs on restart' \
  --data-urlencode 'limit=15'
# → returns 15 results, none are the stored memory
# (top results are all unrelated "restart" voice-conversation extractions)

# Confirmed zero pref/* memories of any kind:
curl -G "$ENDPOINT/recall" -H "X-Api-Key: $KEY" \
  --data-urlencode 'tags=pref' \
  --data-urlencode 'tag_match=prefix' \
  --data-urlencode 'limit=50'
# → {"count": 0, ...}
```

**Action needed:** Check server logs for store operations on 2026-04-16 / 2026-04-17 with that content. Determine whether the store failed at validation, embedding, or persistence, and whether the client was informed.

---

## B. `context_tags` + low `limit` drops boosted results

**Severity:** Medium. Silent ranking corruption.

Same semantic query + `context_tags` boost, varying `limit`:

| `limit` | Count | Top 3 raw scores | Top 3 contents (abbreviated) |
|---|---|---|---|
| 5 | 3 | 1.057, 1.007, **0.465** | Wan 2.2 q4, Wan 2.2 perf, **M5 Max Ollama specs (no project/video tag)** |
| 10 | 4 | 1.057, 1.007, **0.909**, 0.515 | Wan 2.2 q4, Wan 2.2 perf, **Wan 2.2 setup (project/video tag)**, Video gen plan |

At `limit=5`, the server returned the M5 Max Ollama memory at raw 0.465 INSTEAD OF the Wan 2.2 setup memory at raw 0.459 boosted to 0.909. The boost wasn't applied before the top-N cut — looks like tag-boosted and vector-only result streams are being merged or truncated out of order.

**Repro:**
```bash
Q="wan 2.2 benchmark M5 Max"

# Missing a boosted result:
curl -G "$ENDPOINT/recall" -H "X-Api-Key: $KEY" \
  --data-urlencode "query=$Q" \
  --data-urlencode 'context_tags=project/video' \
  --data-urlencode 'limit=5'

# Complete result:
curl -G "$ENDPOINT/recall" -H "X-Api-Key: $KEY" \
  --data-urlencode "query=$Q" \
  --data-urlencode 'context_tags=project/video' \
  --data-urlencode 'limit=10'
```

**Workaround for clients:** always use `limit >= 20` when combining `context_tags` with a semantic query. But the real fix belongs in the server's merge/sort logic.

---

## C. `context_tags` literal-match vs prefix-index asymmetry

Not technically a bug, but an internal inconsistency that surprised us during the prefix/no-prefix tag-scheme evaluation.

- `tags + tag_match=prefix` matches against the `tag_prefixes` index. Both `/` and `:` separators get split into namespace segments (e.g., `project/foo` indexes as `project`, `project:foo`).
- `context_tags` does a literal string compare against the raw `tags` array. `context_tags=["project:foo"]` will NOT match a memory tagged `project/foo` even though they're equivalent via the prefix index.

**Measured:**
```bash
# Same query, same data:
curl ... --data-urlencode 'context_tags=project/mcp-automem'
# → 3 results, top final_score = 1.001 (boost applied)

curl ... --data-urlencode 'context_tags=project:mcp-automem'  
# → 24 results, top final_score = 0.602 (NO boost applied)
```

Result sets are disjoint. One fix option: have `context_tags` consult the prefix index too, so `/` and `:` are interchangeable in both filter paths.

At minimum: document that `tag_prefixes` stores with `:` separator while input uses either `/` or `:`.

---

## D. Template refresh for Opus 4.7 / 1M context

### Context

Tested the current `templates/CLAUDE_MD_MEMORY_RULES.md` two-phase pattern against a proposed 4-phase namespace-prefixed scheme. **Two-phase (production) wins** for Opus 4.7, with three small parameter changes validated by measurement.

### Validated: the existing "bare tag" convention is the right one

Namespace-prefixed tags (`project/<slug>`, `pref/<scope>`, `source/<kind>`, `lang/<lang>`) were explored but are NOT recommended. Reasons, in order of weight:

1. The existing 9,401-memory corpus uses bare tags exclusively. Introducing a prefix scheme creates a bifurcated corpus where old memories are invisible to new exact-match queries.
2. `context_tags` literal-match asymmetry (item C above) means the migration costs real ranking quality during the transition.
3. The prefix doesn't solve the "hard-gate bite" — a memory that lacks tag `project/video` also typically lacks bare tag `video`. The fix is tagging discipline, not syntactic namespaces.

**Only risk with bare tags:** slug collision with topic words. Empirically:

| Gate | Behavior |
|---|---|
| `tags:["streamdeck-mcp"]` | clean — slug is unique |
| `tags:["mcp-automem"]` | clean — slug is unique |
| `tags:["video"]` | bad — pulls unrelated "video content strategy" memories |

**Guidance to add to template:** "Use a project slug that doesn't collide with common topic words (`streamdeck-mcp` ✓, `video` ✗). For short/generic names, either prefix at store time (`video-gen-project`) or omit the tag gate on that project."

### Parameter bumps for 1M context

All measured on the production Phase 2 pattern over the `mcp-automem` slug:

| Config | Count | ≥0.4 |
|---|---|---|
| Current default: `limit=10, time_query="last 30 days"` | 2 | 2 |
| `limit=30, time_query="last 30 days"` | 3 | 3 |
| `limit=10, time_query="last 90 days"` | 2 | 2 |
| **`limit=30, time_query="last 90 days"` (proposed)** | **5** | **5** |

Both bumps compound. 2.5× useful results, zero score quality loss. All 5 returned results scored ≥1.27.

Phase 1 (`tags:["preference"], limit:20`) validated with 14 high-signal results from the corpus (Jack's PR-merge preference, no-markdown-memory preference, etc.). No dilution.

### Validated: `auto_decompose=true` is safe but low-impact

Same query with `auto_decompose=true` vs `false`: 16 results either way, 1 add and 1 drop. Keep it on; it may help for richer multi-topic queries but doesn't move the needle for focused ones.

### Validated: `expand_relations` is a no-op right now

Expansion with current (`relation_limit:8, expansion_limit:60`) vs proposed (`20/150`) both added zero memories on the test query. The corpus is sparse in associations (few explicit `associate_memories` calls). Keep expansion on, don't bother bumping the limits until association discipline improves. Re-measure in ~30 days.

### Proposed template replacement

```
# Phase 1 — Preferences (tag-only, no time filter, no semantic query)
recall_memory({ tags: ["preference"], limit: 20 })

# Phase 2 — Task context (semantic + time-limited + project-gated)
recall_memory({
  queries: [<task topic>, "user corrections", "recent decisions"],
  tags: ["<project-slug>"],
  auto_decompose: true,
  time_query: "last 90 days",
  limit: 30
})

# Optional Phase 3 (on-demand, NOT at every session start) — debugging
recall_memory({
  query: "<error symptom>",
  tags: ["bugfix", "solution"],
  limit: 20
})
```

Deltas from current template: `limit` 10 → 20/30, `time_query` 30 days → 90 days, explicit guidance that Phase 3 is on-demand.

---

## Appendix: Measurement methodology

- Direct HTTP `GET /recall` on production endpoint with `X-Api-Key` auth
- MCP client (`mcp__memory__recall_memory`) cross-checked for parity — parity held in all tests except item B above, which reproduces on both paths
- Corpus state at test time: 9,401 memories, 9,401 vectors, FalkorDB + Qdrant both `connected` and `synced`
- All HTTP calls read-only; no stores or modifications during validation


`limit`	Count	Top 3 raw scores	Top 3 contents (abbreviated)
5	3	1.057, 1.007, 0.465	Wan 2.2 q4, Wan 2.2 perf, M5 Max Ollama specs (no project/video tag)
10	4	1.057, 1.007, 0.909, 0.515	Wan 2.2 q4, Wan 2.2 perf, Wan 2.2 setup (project/video tag), Video gen plan

Config	Count	≥0.4
Current default: `limit=10, time_query="last 30 days"`	2	2
`limit=30, time_query="last 30 days"`	3	3
`limit=10, time_query="last 90 days"`	2	2
`limit=30, time_query="last 90 days"` (proposed)	5	5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Template refresh for Opus 4.7 / 1M context + 2 server bugs surfaced during validation #97

TL;DR

Context

A. Silent store failure — `pref/test` memory never persisted

B. `context_tags` + low `limit` drops boosted results

C. `context_tags` literal-match vs prefix-index asymmetry

D. Template refresh for Opus 4.7 / 1M context

Context

Validated: the existing "bare tag" convention is the right one

Parameter bumps for 1M context

Validated: `auto_decompose=true` is safe but low-impact

Validated: `expand_relations` is a no-op right now

Proposed template replacement

Appendix: Measurement methodology

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Gate	Behavior
`tags:["streamdeck-mcp"]`	clean — slug is unique
`tags:["mcp-automem"]`	clean — slug is unique
`tags:["video"]`	bad — pulls unrelated "video content strategy" memories

Template refresh for Opus 4.7 / 1M context + 2 server bugs surfaced during validation #97

Description

TL;DR

Context

A. Silent store failure — pref/test memory never persisted

B. context_tags + low limit drops boosted results

C. context_tags literal-match vs prefix-index asymmetry

D. Template refresh for Opus 4.7 / 1M context

Context

Validated: the existing "bare tag" convention is the right one

Parameter bumps for 1M context

Validated: auto_decompose=true is safe but low-impact

Validated: expand_relations is a no-op right now

Proposed template replacement

Appendix: Measurement methodology

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

A. Silent store failure — `pref/test` memory never persisted

B. `context_tags` + low `limit` drops boosted results

C. `context_tags` literal-match vs prefix-index asymmetry

Validated: `auto_decompose=true` is safe but low-impact

Validated: `expand_relations` is a no-op right now