Skip to content

refactor(memory): append-only writes + retrieval-driven consolidation (fixes #34)#41

Open
DeerGoat wants to merge 4 commits into
mainfrom
refactor/memory-architecture-append-only
Open

refactor(memory): append-only writes + retrieval-driven consolidation (fixes #34)#41
DeerGoat wants to merge 4 commits into
mainfrom
refactor/memory-architecture-append-only

Conversation

@DeerGoat

@DeerGoat DeerGoat commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

Closes #34. Supersedes #36.

Replaces the cosine-similarity dedup (_deduplicate_and_store_facts, fixed 0.85 threshold) that silently dropped factual updates with an append-only write path plus an autonomous "dream" consolidation agent that tidies the daily logs into a notebook wiki. Cosine measures topical proximity, not factual identity — so no threshold can fix it; the fix removes the write-time dedup entirely and moves consolidation off the critical path.

Full design + the multi-pass audit trail: docs/03-developing/memory-consolidation-plan.md.

What changed (3 phases)

Phase 1 — Append-only writes (the #34 fix)

  • Delete _deduplicate_and_store_facts + the 0.85 threshold (and the legacy process_message_for_memories / MarkdownIndexer). Facts are appended to the daily log and indexed per fact, with no write-time dedup — updates can no longer be dropped.
  • Platform guard in chat_processor: forked/system turns (platform="dream", sub-agents) never feed memory extraction or transcript indexing (fixes a latent sub-agent recursion too).

Phase 2 — Always-on notebook vault + watermark-aware indexing

  • Bootstrap the vault unconditionally at CONFIG.notebook_dir (zones + schema/index/log); the /mnt/notebook mount becomes an optional redirect.
  • CoreMemoryFileIndexer is now the sole LanceDB writer: indexes notebook pages, drops consolidated archives by a log.md watermark= token, skips tombstoned facts, constant importance (ranking = relevance + recency).
  • Mutation invariant: nothing mutates LanceDB directly. Deletion = edit file + reindex (or an indexer-consulted tombstone for immutable logs), so nothing resurrects.

Phase 3 — DreamRunner (autonomous wiki keeper)

  • A gated BaseBrain background agent (forked, file-tool-whitelisted) consolidates daily logs into the vault, resolving correction / state-change (keeps history) / duplicate / genuine-conflict (escalates) cases. It owns the watermark (runner-written, proof-of-work-gated), runs catch-up sprints for backlogs with retry-then-skip, regenerates MEMORY.md, and pauses the file watcher while it works.
  • POST /memory/consolidate triggers it on demand; POST /memory/reindex {clear_existing:true} rebuilds the index from files.

Verification

  • All changed/new modules compile and import cleanly in the project venv (no circular deps); suzent.server route wiring imports.
  • Memory test suite green (14 passed); MemoryExtractionResult test updated for the slimmed model.
  • Smoke-tested the pure logic: per-fact archive parsing (category + tags), watermark round-trip, tombstone normalize/match, recall log, vault bootstrap (zones + nav), notebook page listing.
  • Grep-verified no dangling references to any deleted symbol across src/.

Migration

After deploy, run POST /memory/reindex {"clear_existing": true} once to rebuild the index from files (memory + notebook). Raw daily logs are the immutable source of truth, so nothing is at risk.

Notes for review

  • The full design and every audit finding (C1–C5, M1–M5, NEW-1…11, §A/§B) are documented inline in the plan doc with the resolution for each.
  • pytest-asyncio is required to run the async memory tests locally (already a project dev need).

🤖 Generated with Claude Code

DeerGoat and others added 4 commits May 30, 2026 00:54
…onsolidation architecture

This is a planning commit for discussion. No code changes yet.

Closes #34
…fixes #34)

Replace the cosine-threshold dedup that silently dropped factual updates with an
append-only write path plus an autonomous "dream" consolidation agent that tidies
the daily logs into a notebook wiki.

Phase 1 - append-only writes (the #34 fix):
- Remove _deduplicate_and_store_facts and the 0.85 similarity threshold; facts are
  appended to the daily log and indexed per-fact, with no write-time dedup.
- Platform guard: forked/system turns (dream, sub-agents) never feed memory
  extraction or transcript indexing.

Phase 2 - always-on notebook vault + watermark-aware indexing:
- Bootstrap the vault unconditionally at CONFIG.notebook_dir (zones + nav files).
- CoreMemoryFileIndexer is the sole LanceDB writer: indexes notebook pages, drops
  consolidated archives by a log.md watermark, skips tombstoned facts, constant
  importance. Delete the broken MarkdownIndexer.
- Deletion = edit file + reindex / tombstone (never a direct LanceDB mutation).

Phase 3 - DreamRunner (autonomous wiki keeper):
- Gated background agent (time+volume / catch-up + retry-skip + proof-of-work) that
  consolidates daily logs into the vault, owns the watermark, and regenerates
  MEMORY.md. POST /memory/consolidate triggers it on demand.

Design + audit trail: docs/03-developing/memory-consolidation-plan.md

Closes #34

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@DeerGoat DeerGoat marked this pull request as ready for review June 9, 2026 16:27
@DeerGoat

DeerGoat commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Implementation walkthrough (review guide)

This documents the exact runtime behavior of the change so reviewers can follow the code. The conceptual design is in docs/03-developing/memory-consolidation-plan.md; this is the "what the code actually does" companion.

TL;DR data flow

chat turn ──> extract facts (LLM) ──> append to archive/YYYY-MM-DD.md ──> reindex that day's log
                                          (source of truth)                (LanceDB, per fact)

dream (gated, background) ──> forked agent reads archive logs ──> writes/updates vault pages
                                                              ──> runner writes watermark to log.md
                                                              ──> reindex vault + drop consolidated logs
                                                              ──> regenerate MEMORY.md

LanceDB is a disposable projection — every row is (re)written only by CoreMemoryFileIndexer. Nothing else mutates it.


1. Where things live on disk

Path What Written by
<sandbox>/shared/memory/archive/YYYY-MM-DD.md append-only daily fact log (source of truth) markdown_store.append_daily_log
<sandbox>/shared/memory/{persona,user,MEMORY}.md always-visible core blocks agent / promote_memory_md
~/.suzent/notebook/ (CONFIG.notebook_dir) the wiki vault: schema.md, index.md, log.md, zones 0_Inbox … 5_Archives WikiManager (bootstrap) + the dream agent
~/.suzent/notebook/.state/recall_log.jsonl retrieval events (usage signal) markdown_store.append_recall
~/.suzent/notebook/.state/tombstones.jsonl user-deleted facts (normalized content) markdown_store.append_tombstone
~/.suzent/notebook/log.md journal; holds the watermark=YYYY-MM-DD token DreamRunner (and manual /ingest)
<sandbox>/shared/memory/.index_state.json per-file mtime state for the indexer CoreMemoryFileIndexer

/mnt/notebook is a default mount injected by config.get_effective_volumes() pointing at CONFIG.notebook_dir (a user-supplied /mnt/notebook mapping overrides it). The vault is always bootstrapped now (the old if notebook_host_path: gate in lifecycle.py is removed).

2. Write path (per turn) — manager.process_conversation_turn_for_memories

  1. _extract_facts_llm(turn) → list of ExtractedFact (unchanged extraction).
  2. _write_facts_to_markdown appends - [category] content \tags`lines under a## HH:MM — chatidheader toarchive/.md; returns the date. (category or "general"fixes theNone` bug.)
  3. _core_indexer.reindex_file_now(label="archive", filename="<date>.md"):
    • delete_memories_by_source_date(date) then re-parse the file into one row per fact line and embed each, then record the file's mtime.
    • Delete-then-add → idempotent and race-free with the 300 s watcher (no duplicate rows even if both run).

⚠️ Cost to scrutinize: because we re-index the whole current-day log on every turn (not just the new line), a busy day re-embeds today's growing log repeatedly (~O(n²)/day in embeddings). This was a deliberate simplification over per-line delta-indexing, which had a watcher dup-race. The plan (§A/C4/NEW-3) describes the delta optimization as a clean follow-up. Embeddings are the cheap/local role model, and this runs in the background post-turn job, but it's the main perf trade to weigh.

3. Indexing behavior — CoreMemoryFileIndexer

  • Sole LanceDB writer. Shared single instance (manager._core_indexer), used by the per-turn path, the background watcher, and the dream. An asyncio.Lock serializes all mutations.
  • Granularity: diary logs → one row per fact (_parse_archive_facts); notebook pages + core files → one row per paragraph chunk (_chunk_by_paragraphs).
  • Watermark-aware archives: reads the watermark= token from log.md. A log with date <= watermark is dropped from the index once (and skipped thereafter) — its facts now live in vault pages. Logs > watermark index normally.
  • Tombstone-skip: diary facts whose normalized content is tombstoned are not indexed (so a deleted log fact can't resurrect, even on a full rebuild).
  • Importance is a constant 0.5 for every row (ranking is relevance + recency; importance is no longer a tuned lever).
  • clear_and_full_reindex(...) wipes the user's rows and rebuilds from files (core + notebook + post-watermark archives). Backs POST /memory/reindex {clear_existing:true}.

LanceDB row metadata schema (per source):

// diary fact            // notebook page chunk           // core file chunk
{ source_type:"archive_log",   { source_type:"notebook",        { source_type:"core_file",
  source_file:"2026-06-09.md",   source_file:"3_Personal/x.md",   source_file:"MEMORY.md",
  category:"career",             chunk_index, label:"notebook",   chunk_index, label:"facts",
  tags:["work"] }                category, tags }                 category, tags }

4. Retrieval

Unchanged in shape (retrieve_relevant_memories FTS hook per turn; search_memories hybrid for the agent tool), but the index surface is now core files + notebook pages + post-watermark daily logs. Both paths call _log_recalls(...), appending retrieved snippets to recall_log.jsonl (the promotion signal).

5. Consolidation — DreamRunner (core/dream_runner.py)

A BaseBrain started/stopped in lifecycle.init_memory_system / shutdown_memory_system (only when markdown + embedding are configured and memory_consolidation_enabled).

Gate (_tick, every interval_seconds, default 1800):

  • pending = archive dates > watermark and strictly before today UTC (never touches the in-progress day).
  • behind = len(pending) > max_dayscatch-up sprint (ignores the daily/volume gate, one batch per tick).
  • else steady state: require now - last_attempt_at >= min_hours (in-memory attempt clock — not the watermark) and fact_lines(pending) >= min_facts.

Run (_run_dream, under a lock):

  1. batch = pending[:max_days], w_new = batch[-1]. If this batch has failed >= max_retries times → skip it (advance watermark past it; its facts stay in the immutable log) so one bad day can't wedge the backlog.
  2. Snapshot content-page mtimes; pause the watcher (a lifecycle.core_watcher_gate Event the watcher awaits).
  3. Reset the persistent hidden system-dream chat (agent_state=b"", messages=[] — note: None means "no change" in update_chat, hence b"") and run a forked agent via ChatProcessor.process_turn_text with config {platform:"dream", memory_enabled:False, auto_approve_tools:True, tools:CONFIG.memory_dream_tools, static_instructions:DREAM_SYSTEM_PROMPT}, message = DREAM_INSTRUCTIONS.format(start=watermark, end=w_new), wrapped in asyncio.wait_for(timeout).
  4. Resume the watcher.
  5. Proof of work: if no content page changed → don't advance; bump the failure counter; back off. Otherwise: write the watermark=w_new entry to log.md, run promote_memory_md, then check_and_update (indexes new pages + drops archives <= w_new).

Why it's confined: memory_dream_tools is file tools + MemorySearchTool only (AgentTool is never granted → no recursion); PathResolver limits writes to /shared + /mnt/notebook; memory_enabled:False means get_or_create_agent injects no memory context (clean prompt) and the _is_system_chat guard skips extraction/transcript for platform=="dream". The agent is bound to a local var per turn, so a concurrent user turn can't swap its toolset.

promote_memory_md: reads 3_Personal/*.md + recent recall snippets → one LLM call → writes MEMORY.md (capped at memory_max_lines). This replaces the old refresh_core_memory_facts once the dream is active (the old method is kept so MEMORY.md stays fresh between deploy and the first dream run).

POST /memory/consolidateDreamRunner.force_run() (bypasses the time/volume gate, not the lock).

6. Deletion — delete_archival_memory

Never mutates LanceDB ad-hoc. Resolves the row via store.get_memory(id), then:

  • archive factappend_tombstone(content) + reindex_file_now(archive) → the fact is gone from the index and can't resurrect.
  • notebook/core rowappend_tombstone + best-effort delete_memory(id). ⚠️ Known limitation: a fact baked into a notebook page can reappear on a full reindex until the page itself is edited (queued for a dream/lint pass). Worth a reviewer's eye.

7. The platform guard — chat_processor._is_system_chat

Reads the chat's config.platform; for {"dream","subagent","subagent_wakeup"} it skips both the transcript write (B1) and memory extraction (B2). This fixes the dream-recursion and also a pre-existing latent bug where sub-agents (which set memory_enabled:False) were still extracted because B2 gated only on global CONFIG.memory_enabled.

8. Config knobs (all in config/__init__.py)

Knob Default Meaning
notebook_dir ~/.suzent/notebook vault location
memory_consolidation_enabled True master switch for the dream
memory_consolidation_min_hours 24.0 steady-state cadence (per attempt)
memory_consolidation_min_facts 20 steady-state volume gate
memory_consolidation_interval_seconds 1800 how often the gate is checked
memory_consolidation_timeout_seconds 600 per-run agent timeout
memory_consolidation_max_days 14 batch size; backlog beyond this triggers catch-up
memory_consolidation_max_retries 3 no-op batches before skip
memory_consolidation_memory_max_lines 200 MEMORY.md cap
memory_consolidation_model None optional model override for the dream
memory_dream_tools file tools + MemorySearchTool dream whitelist

9. File-by-file review map

File What to look at
memory/manager.py append-only process_conversation_turn_for_memories; deletions; promote_memory_md; _log_recalls; refresh_core_memory_facts retained
memory/indexer.py MarkdownIndexer removed; lock + reindex_file_now + clear_and_full_reindex; per-fact _parse_archive_facts; watermark + tombstone logic in _check_and_update_impl/_reindex_file
memory/markdown_store.py vault/page helpers, read_watermark/write_watermark_entry, recall + tombstone helpers
memory/wiki_manager.py always-on bootstrap + zones
memory/lifecycle.py always-create vault; shared indexer; core_watcher_gate; start/stop DreamRunner
core/dream_runner.py new — gate, catch-up, retry-skip, proof-of-work, forked agent
core/chat_processor.py _is_system_chat guard on B1+B2
memory/lancedb_store.py get_memory(id) helper
routes/{memory,session}_routes.py /memory/consolidate; file-edit delete; reindex delegates to the indexer
memory/models.py MemoryExtractionResult slimmed (no memories_created/updated)
config/__init__.py knobs + /mnt/notebook default mount

10. Known limitations / things to weigh

  • Per-turn re-embed cost (§2 above) — the main perf trade; delta-indexing is the documented follow-up.
  • Transient duplicate window — a fresh fact lives in both the raw log and (after consolidation) a page until the watermark advances; strictly better than the old silent drop.
  • Notebook-page deletion is best-effort (§6).
  • recall_log.jsonl isn't truncated yet — it grows until a future cleanup (open question in the plan).
  • Single-user — one vault under CONFIG.user_id (pre-existing assumption).
  • Phase 3 is runtime-exercised only — imports, DB/agent signatures, and the chat-reset are verified; the full forked-agent run happens at runtime (gated, or via POST /memory/consolidate).

11. How to exercise manually

  1. Deploy, then POST /memory/reindex {"clear_existing": true} (rebuild index from files).
  2. Chat a few turns → check archive/<today>.md grows and memory_search returns the new facts (no drops).
  3. POST /memory/consolidate → check the dream writes pages under ~/.suzent/notebook/, appends a watermark= line to log.md, and refreshes MEMORY.md.
  4. Async memory tests need pytest-asyncio (uv pip install pytest-asyncio); tests/memory is green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT] Memory deduplication: fixed cosine threshold causes silent data loss

1 participant