This project implements different memory strategies for LLMs using LangGraph and PostgreSQL with vector embeddings.
llm memory/
βββ docker-compose.yml # PostgreSQL with pgvector extension
βββ .env # Environment variables (API keys, DB config)
βββ .gitignore
βββ short-term-memory/
β βββ trimming/ # β
Strategy 1: Message Trimming
β β βββ main.py
β β βββ config.py
β β βββ database.py
β β βββ requirements.txt
β βββ summary/ # β
Strategy 2: Conversation Summary
β β βββ main.py
β β βββ config.py
β β βββ database.py
β β βββ requirements.txt
β βββ user-progress/ # π Strategy 3: User Progress Tracking
β βββ (to be implemented)
βββ long-term-memory/ # β
Long-term Memory with Semantic Search
βββ main.py # LangGraph implementation
βββ config.py # Configuration
βββ database.py # PostgreSQL + pgvector operations
βββ embeddings.py # Embedding generation (sentence-transformers)
βββ memory_extractor.py # LLM-based memory extraction
βββ memory_manager.py # Memory lifecycle management
βββ context_builder.py # Context window assembly
βββ requirements.txt
Edit the .env file and add your OpenRouter API key:
OPENROUTER_API_KEY=your_actual_api_key_here
MODEL_NAME=meta-llama/llama-3.1-8b-instructIMPORTANT: We now use ankane/pgvector image for semantic search support!
# Navigate to project folder
cd "D:\llm memory"
# Stop old container if running
docker-compose down -v
# Start PostgreSQL container with pgvector
docker-compose up -d
# Check if container is running
docker ps
# View logs (optional)
docker logs llm_memory_postgresEach strategy has its own requirements.txt:
# For long-term memory (recommended to start here)
cd "long-term-memory"
pip install -r requirements.txt
# For trimming strategy
cd "short-term-memory\trimming"
pip install -r requirements.txt
# For summary strategy
cd "short-term-memory\summary"
pip install -r requirements.txtNote: Long-term memory requires additional packages:
sentence-transformers- For embedding generation (~400MB download first time)pgvector- PostgreSQL vector extension supportnumpy- Numerical operations
Recommended: Use a virtual environment:
python -m venv venv
venv\Scripts\activate # On Windows
pip install -r requirements.txtHow it works: A complete LTM system with semantic search, memory extraction, and intelligent retrieval.
User Input
β
1. LTM Search (Semantic) β Retrieve relevant memories
β
2. STM Retrieval β Get recent conversation
β
3. Context Assembly β Combine LTM + STM + System Prompt
β
4. LLM Generation β Generate response
β
5. Memory Extraction β Store new important info in LTM
- Uses LLM to analyze conversations
- Extracts only important information worth remembering
- Categorizes:
personal_info,preference,fact,decision,goal - Assigns importance score (1-10)
Example:
User: "My name is Sarah and I'm building a weather app"
β
Extracted Memory:
{
"content": "User's name is Sarah. Working on weather app project.",
"memory_type": "personal_info",
"importance": 8
}
- Converts memory text to 384-dimensional vector embedding
- Stores in PostgreSQL with pgvector extension
- Enables semantic search (not just keyword matching)
Database Schema:
ltm_memories
βββ id (primary key)
βββ user_id (indexed)
βββ content (text)
βββ memory_type (personal_info/preference/fact/decision/goal)
βββ importance (1-10)
βββ embedding (VECTOR(384)) -- Semantic embedding
βββ created_at
βββ last_accessed
βββ access_count- Generates embedding for user's current query
- Uses cosine similarity to find relevant memories
- Returns top K memories above similarity threshold
Example:
Query: "What was I working on?"
β
Embedding: [0.23, -0.45, 0.67, ...]
β
Search LTM using cosine similarity
β
Found: "User's name is Sarah. Working on weather app project."
Similarity: 0.89
- Combines similarity score + importance
- Formula:
relevance = (similarity Γ 0.7) + (importance/10 Γ 0.3) - Updates access tracking (count, last_accessed)
cd "long-term-memory"
python main.py- Type your message to chat
memories- View all stored memoriesstats- View memory statisticsclear- Clear all data (STM + LTM)quit- Exit
You: Hi, my name is Alex and I love Python programming
π Searching long-term memories...
No relevant memories found
π§ Extracting memories...
β
Memory created: [personal_info] User's name is Alex. Enjoys Python programming.
Importance: 8/10
π€ Assistant: Hello Alex! It's great to meet you! Python is an excellent language...
---
(Later in conversation...)
You: What programming languages do I like?
π Searching long-term memories...
β
Found 1 relevant memories:
1. [personal_info] User's name is Alex. Enjoys Python programming.
Relevance: 0.91 (similarity: 0.94, importance: 8/10)
π€ Assistant: Based on what you've told me, you love Python programming!
# config.py - Customize these settings
STM_LIMIT = 10 # Recent messages to keep
TOP_K_MEMORIES = 5 # Max memories to retrieve
MIN_SIMILARITY = 0.7 # Minimum similarity threshold
SIMILARITY_WEIGHT = 0.7 # Weight for semantic similarity
IMPORTANCE_WEIGHT = 0.3 # Weight for memory importance
HIGH_RELEVANCE_THRESHOLD = 0.85 # High emphasis threshold
MEDIUM_RELEVANCE_THRESHOLD = 0.70 # Medium emphasis threshold- β Semantic search - Finds relevant memories by meaning, not keywords
- β Smart extraction - LLM decides what's worth remembering
- β Relevance weighting - Balances similarity and importance
- β Access tracking - Tracks which memories are most useful
- β Multi-user support - Each user has their own memories
- β Persistent storage - Memories survive across sessions
How it works:
- Keeps only the last N messages (default: 10)
- Automatically deletes older messages from PostgreSQL
- Simple sliding window approach
Database Schema:
messages_trimming
βββ id (primary key)
βββ session_id (indexed)
βββ role (user/assistant)
βββ content (text)
βββ timestamp
Run:
cd "short-term-memory\trimming"
python main.pyBest for: Short, casual conversations where old context isn't needed
How it works:
- Keeps only recent N messages (default: 10) in raw form
- When limit exceeded, summarizes oldest K messages (default: 5)
- Stores summary separately and deletes summarized messages
- Summary evolves as conversation grows
Database Schema:
messages_summary
βββ id, session_id, role, content, timestamp
conversation_summary
βββ session_id (primary key)
βββ summary (text)
βββ updated_at
Run:
cd "short-term-memory\summary"
python main.pyBest for: Long conversations where historical context matters
| Aspect | Trimming βοΈ | Summary π§ | Long-term Memory πΎ |
|---|---|---|---|
| Storage | PostgreSQL | PostgreSQL | PostgreSQL + Vectors |
| Old messages | Deleted | Summarized | Extracted as memories |
| Context retention | Lost | Condensed | Semantically searchable |
| Token usage | Low | Medium | Medium-High |
| Information loss | High | Low | Very Low |
| Setup complexity | Simple | Moderate | Complex |
| Cross-session memory | No | No | Yes |
| Semantic search | No | No | Yes |
| Best for | Casual chats | Long conversations | Personal assistants |
docker-compose up -ddocker-compose downdocker-compose down -vdocker logs llm_memory_postgresdocker exec -it llm_memory_postgres psql -U llm_user -d llm_memory_db-- Check pgvector extension
SELECT * FROM pg_extension WHERE extname = 'vector';
-- View long-term memories
SELECT id, user_id, memory_type, importance, content, access_count
FROM ltm_memories
ORDER BY created_at DESC;
-- View STM messages
SELECT * FROM stm_messages ORDER BY timestamp DESC LIMIT 20;
-- Count memories by type
SELECT memory_type, COUNT(*)
FROM ltm_memories
GROUP BY memory_type;
-- Most accessed memories
SELECT content, access_count, last_accessed
FROM ltm_memories
ORDER BY access_count DESC
LIMIT 10;Python Script (main.py)
β
Connects to: localhost:5432
β
Docker Port Mapping (5432:5432)
β
Container Port 5432
β
PostgreSQL with pgvector
Your Python code connects to localhost:5432, and Docker transparently forwards it to the container!
1. User sends message
β
2. Search LTM for relevant memories (semantic search)
β
3. Retrieve recent STM messages
β
4. Build context: System + LTM + STM
β
5. Send to LLM
β
6. Get response
β
7. Extract new memories (if important info)
β
8. Store in LTM with embeddings
β
9. Store message in STM
- Check Docker:
docker ps - Verify pgvector image: Should show
ankane/pgvector - Check port 5432:
netstat -ano | findstr :5432 - Restart:
docker-compose down && docker-compose up -d
- First run downloads
all-MiniLM-L6-v2(~400MB) - Requires internet connection
- Downloads to
~/.cache/torch/sentence_transformers/
- Make sure you're using
ankane/pgvectorimage (notpostgres:15-alpine) - Run:
docker-compose down -v && docker-compose up -d - Check extension:
docker exec -it llm_memory_postgres psql -U llm_user -d llm_memory_db -c "SELECT * FROM pg_extension WHERE extname = 'vector';"
- Install all requirements:
pip install -r requirements.txt - Activate virtual environment if using one
- For
pgvectorpackage issues, try:pip install pgvector --upgrade
By completing this project, you'll understand:
- β Short-term vs Long-term memory in LLMs
- β Semantic search with vector embeddings
- β PostgreSQL pgvector extension
- β LLM-based information extraction
- β Context window management
- β LangGraph for stateful AI applications
- β Docker for development databases
- β Memory relevance weighting and retrieval
- β Test trimming strategy
- β Test summary strategy
- β Test long-term memory (START HERE!)
- π Implement user-progress tracking
- π Add memory consolidation (merge similar memories)
- π Implement memory decay/forgetting
- π Add importance auto-adjustment based on access patterns
- LangGraph Documentation
- OpenRouter API
- pgvector GitHub
- sentence-transformers
- PostgreSQL Docker Image
This is a practice project for learning purposes. Feel free to modify and experiment!
Happy Learning! ππ§