Lightweight semantic search over your AI agent's memory files. No vector database. No API calls. Runs on CPU.
Packaged Pro version available: download on Gumroad (€29).
Main site: https://datis-agent.com
This repository is the free core version:
semantic_memory.pylibrary- Base README + one basic example
- Local semantic retrieval with no vector DB
The Gumroad Pro pack includes everything above, plus:
- 3 ready-to-use integrations (Anthropic, OpenAI, Ollama)
- Reindex-on-change script
- Memory dedupe script
- Agent starter template
- Prompt recipes + troubleshooting guide
If you just want to learn or build a minimal setup, this free repo is enough. If you want to implement faster in real agents with less trial-and-error, use Pro.
Most AI agents treat memory as a file you append to and eventually load into context. This fails in two ways:
- Too much context — loading everything hits token limits and costs money
- Keyword search misses intent — searching for "payment setup" won't find "configured Stripe for billing"
Embed your memory files locally using all-MiniLM-L6-v2 (22MB, runs on CPU). At query time, encode the query and retrieve the most relevant chunks by cosine similarity. Understands meaning — "remote access" matches "VPN tunneling" without shared keywords.
from semantic_memory import SemanticMemory
mem = SemanticMemory("~/.agent/memory")
mem.index()
results = mem.query("Stripe payment integration", top_k=3)
# → [0.847] stripe_notes.md: Set up Stripe webhook handler. Use idempotency keys...
# → [0.731] payments.md: Stripe requires HTTPS in live mode...
# → [0.612] decisions.md: Chose Stripe over PayPal due to better API documentation...pip install sentence-transformers numpyThen copy semantic_memory.py into your project.
from semantic_memory import SemanticMemory
mem = SemanticMemory("~/.agent/memory")
mem.index()
# → Indexing 205 chunks from 12 files...
# → Index built. 205 chunks ready.Supports .md, .txt, .json files. Index is cached to .semantic_index.json — no re-indexing unless files change.
results = mem.query("what did we decide about the database?", top_k=5)
for r in results:
print(f"[{r.score:.3f}] {r.source}: {r.text[:100]}")context = mem.query_and_format("current project priorities", top_k=4)
response = client.messages.create(
model="claude-haiku-4-5",
system=f"You are an assistant.\n\nMemory:\n{context}",
messages=[{"role": "user", "content": user_message}]
)import ollama
from semantic_memory import SemanticMemory
mem = SemanticMemory("~/.agent/memory")
mem.index()
def chat(message):
context = mem.query_and_format(message, top_k=4)
return ollama.chat(
model="mistral",
messages=[
{"role": "system", "content": f"Memory:\n{context}"},
{"role": "user", "content": message}
]
)["message"]["content"]python semantic_memory.py ~/.agent/memory --index
python semantic_memory.py ~/.agent/memory "what is the status of the API project?"Memory files (.md / .txt / .json)
│
▼
┌───────────┐
│ Chunker │ Split into ~400-char overlapping segments
└─────┬─────┘
│
▼
┌───────────┐
│ Encoder │ all-MiniLM-L6-v2 (22MB, CPU-only)
└─────┬─────┘
│
▼
┌──────────────────────┐
│ .semantic_index.json│ Chunks + embeddings stored locally
└─────┬────────────────┘
│
Query │ encode → cosine similarity → top-K
▼
Ranked relevant chunks (with score + source file)
│
▼
Inject into model context
Measured on MacBook Air M4, CPU-only, no GPU:
| Operation | Result |
|---|---|
| Index build (2,000 chunks) | 13.2s |
| Avg query time | 42ms |
| Min query time | 8ms |
| Index cache load | ~0.4s |
| Model size | 22MB |
| RAM while loaded | ~180MB |
Use this when:
- Fewer than ~10,000 memory chunks
- Zero infrastructure — no server, no Docker, no account
- Privacy matters (all data stays local)
- Single-process agent
Use Chroma / Pinecone / pgvector when:
- More than 10,000 chunks
- Multiple processes need concurrent access
- Sub-10ms latency required at scale
sentence-transformers>=2.2.0
numpy>=1.21.0
Python 3.8+. No other dependencies.
MIT
Extended examples including multi-agent setups, custom indexing patterns, and OpenAI function calling integration: Gumroad