A Retrieval Augmented Generation (RAG) system that allows you to query documents using semantic search and generate answers using a local LLM via Ollama.
This project:
- Ingests text documents from the
documents/folder - Chunks and embeds them using SentenceTransformers
- Stores embeddings in ChromaDB for semantic search
- Retrieves relevant context for queries
- Generates answers using Ollama (Mistral 7B by default)
This guide assumes you're starting from scratch with no Python installed.
- Visit python.org/downloads
- Download the latest Python 3.11+ installer
- Run the installer
- IMPORTANT: Check "Add Python to PATH" before clicking Install
- Verify installation:
python --version
# Using Homebrew (install Homebrew first from brew.sh if needed)
brew install [email protected]sudo apt update
sudo apt install python3.11 python3.11-venv python3-pippipx allows you to install Python applications in isolated environments.
python -m pip install --user pipx
python -m pipx ensurepathAfter installation, close and reopen your terminal for the PATH changes to take effect.
Verify installation:
pipx --versionpipx install virtualenvVerify installation:
virtualenv --versionOllama is required to run the LLM locally.
- Visit ollama.com/download
- Download and install Ollama for your OS
- Verify installation:
ollama --version
curl -fsSL https://ollama.com/install.sh | shollama pull mistral:7bThis downloads the Mistral 7B model (approximately 4GB). You can change the model in config.py if you prefer a different one.
cd /path/to/project
virtualenv venvThis creates a venv/ folder containing an isolated Python environment.
.\venv\Scripts\Activate.ps1.\venv\Scripts\activate.batsource venv/Scripts/activatesource venv/bin/activateYou should see (venv) prefix in your terminal prompt when activated.
pip install -r requirements.txtThis installs:
chromadb- Vector database for storing document embeddingssentence-transformers- For generating text embeddingsrequests- For communicating with Ollama API
Note: First installation may take several minutes as it downloads ML models.
Place your .txt files in the documents/ folder. The system will process all text files in this directory.
Example:
documents/
your_document.txt
another_document.txt
This processes your documents and stores them in the vector database:
python ingest.pyYou should see output like:
Loading documents...
Found 1 documents
Loading embedding model...
documents/meditations_by_marcus_aurelius.txt: 142 chunks
Generating embeddings for 142 chunks...
Storing in ChromaDB...
✓ Indexed 142 chunks from 1 documents
Start the interactive query interface:
python query.pyExample interaction:
RAG System ready (using mistral:7b)
Type 'quit' to exit
Ask a question: What does Marcus Aurelius say about anger?
Retrieving relevant context...
Found 3 relevant chunks
Generating answer...
Answer: [Generated answer based on your documents]
Sources:
- documents/meditations_by_marcus_aurelius.txt
Type quit, exit, or q to exit or CTRL + c
Edit config.py to customize
When you're done working, deactivate the virtual environment:
deactivate- Make sure Python is installed and added to PATH
- Try
python3instead ofpython
- Close and reopen your terminal
- Ensure PATH was updated with
python -m pipx ensurepath
- Make sure pipx is working first
- Reinstall:
pipx install virtualenv
- Ensure Ollama is running:
ollama serve - Check if model is pulled:
ollama list
- Ensure virtual environment is activated (you should see
(venv)in prompt) - Try upgrading pip:
pip install --upgrade pip
- First run downloads the
all-MiniLM-L6-v2model (80MB) - Subsequent runs will be faster
rag_system/
├── documents/ # Place your .txt files here
├── chroma_db/ # Vector database (auto-generated)
├── venv/ # Virtual environment (auto-generated)
├── config.py # Configuration settings
├── ingest.py # Document ingestion script
├── query.py # Query interface
├── requirements.txt # Python dependencies
└── readme.md # This file