Note: By default, Archive-RAG uses local processing only (constitution-compliant). Remote processing is an opt-in feature that can be enabled via environment variables to reduce local memory usage.
Remote processing allows you to offload memory-intensive operations (embeddings and LLM inference) to remote API endpoints while keeping FAISS indexing and retrieval local. This reduces local memory usage significantly.
Default Mode: Local processing (constitution-compliant)
- ✅ Local embeddings (sentence-transformers)
- ✅ Local FAISS vector search
- ✅ No external API dependencies
- ✅ Offline-capable
Remote Mode (opt-in): Memory-efficient processing
- ✅ Remote embeddings (OpenAI, HuggingFace, or custom API)
- ✅ Remote LLM inference (OpenAI, HuggingFace, or custom API)
- ✅ Local FAISS vector search (still local for performance)
⚠️ Requires internet connection⚠️ Requires API keys
Set environment variables to enable remote processing:
# Enable remote processing mode
export ARCHIVE_RAG_PROCESSING_MODE=remote
# Configure embedding service (choose one)
export ARCHIVE_RAG_REMOTE_EMBEDDINGS=true
export ARCHIVE_RAG_EMBEDDING_API_URL="https://api.openai.com/v1" # or HuggingFace URL
export ARCHIVE_RAG_EMBEDDING_API_KEY="your-api-key-here"
# Configure LLM service (choose one)
export ARCHIVE_RAG_REMOTE_LLM=true
export ARCHIVE_RAG_LLM_API_URL="https://api.openai.com/v1" # or HuggingFace URL
export ARCHIVE_RAG_LLM_API_KEY="your-api-key-here"
export ARCHIVE_RAG_LLM_MODEL="gpt-3.5-turbo" # or "gpt-4", HuggingFace model, etc.export ARCHIVE_RAG_REMOTE_EMBEDDINGS=true
export ARCHIVE_RAG_EMBEDDING_API_URL="https://api.openai.com/v1"
export ARCHIVE_RAG_EMBEDDING_API_KEY="sk-..."
export ARCHIVE_RAG_EMBEDDING_MODEL="text-embedding-3-small" # or "text-embedding-ada-002"
export ARCHIVE_RAG_REMOTE_LLM=true
export ARCHIVE_RAG_LLM_API_URL="https://api.openai.com/v1"
export ARCHIVE_RAG_LLM_API_KEY="sk-..."
export ARCHIVE_RAG_LLM_MODEL="gpt-3.5-turbo"Note: Many sentence-transformers models are configured for similarity tasks, not feature extraction. Use BAAI/bge models for embeddings:
export ARCHIVE_RAG_REMOTE_EMBEDDINGS=true
export ARCHIVE_RAG_EMBEDDING_API_URL="https://router.huggingface.co/hf-inference"
export ARCHIVE_RAG_EMBEDDING_API_KEY="hf_..." # or use HUGGINGFACE_API_KEY
export ARCHIVE_RAG_EMBEDDING_MODEL="BAAI/bge-small-en-v1.5" # Recommended: supports feature extraction (384-dim)
# Alternative models that work:
# - BAAI/bge-base-en-v1.5 (768-dim)
# - BAAI/bge-large-en-v1.5 (1024-dim)
export ARCHIVE_RAG_REMOTE_LLM=true
export ARCHIVE_RAG_LLM_API_URL="https://router.huggingface.co/hf-inference"
export HUGGINGFACE_API_KEY="hf_..."
export ARCHIVE_RAG_LLM_MODEL="mistralai/Mistral-7B-Instruct-v0.2"export ARCHIVE_RAG_REMOTE_EMBEDDINGS=true
export ARCHIVE_RAG_EMBEDDING_API_URL="https://your-custom-api.com/embeddings"
export ARCHIVE_RAG_EMBEDDING_API_KEY="your-api-key"
export ARCHIVE_RAG_REMOTE_LLM=true
export ARCHIVE_RAG_LLM_API_URL="https://your-custom-api.com/generate"
export ARCHIVE_RAG_LLM_API_KEY="your-api-key"Once configured, use Archive-RAG normally:
# Remote embeddings will be used automatically if configured
archive-rag index data/meetings/ indexes/meetings.faiss
# Remote LLM will be used automatically if configured
archive-rag query indexes/meetings.faiss "What decisions were made?"You can enable remote processing for only certain components:
# Use remote embeddings but local LLM (or template-based)
export ARCHIVE_RAG_REMOTE_EMBEDDINGS=true
export ARCHIVE_RAG_REMOTE_LLM=false # Keep LLM local- Embeddings: ~200-500 MB per model (sentence-transformers)
- LLM: ~2-8 GB for small models (GPT-2), 20+ GB for larger models
- FAISS Index: ~50-200 MB for 10k documents
- Total: ~2.5-25 GB depending on models
- Embeddings: 0 MB (API calls only)
- LLM: 0 MB (API calls only)
- FAISS Index: ~50-200 MB (still local for performance)
- Total: ~50-200 MB (just FAISS index)
Important: Remote processing is opt-in and disabled by default. The system:
- ✅ Defaults to local processing (constitution-compliant)
- ✅ Requires explicit environment variable configuration
- ✅ Falls back to local processing if remote services unavailable
- ✅ Maintains audit logging and reproducibility where possible
- API Keys: Store API keys securely (use
.envfiles, not in code) - Data Privacy: Remote APIs process your data - ensure compliance with privacy policies
- Rate Limiting: Be aware of API rate limits and costs
- Offline Mode: Keep local processing as fallback for offline scenarios
Create a .env file in the project root:
# Remote Processing Configuration (opt-in)
ARCHIVE_RAG_PROCESSING_MODE=remote
# Embedding Service
ARCHIVE_RAG_REMOTE_EMBEDDINGS=true
ARCHIVE_RAG_EMBEDDING_API_URL=https://api.openai.com/v1
ARCHIVE_RAG_EMBEDDING_API_KEY=sk-your-key-here
# LLM Service
ARCHIVE_RAG_REMOTE_LLM=true
ARCHIVE_RAG_LLM_API_URL=https://api.openai.com/v1
ARCHIVE_RAG_LLM_API_KEY=sk-your-key-here
ARCHIVE_RAG_LLM_MODEL=gpt-3.5-turbo
# HuggingFace (alternative)
# HUGGINGFACE_API_KEY=hf-your-key-hereLoad with python-dotenv:
pip install python-dotenvThen in your code or CLI wrapper:
from dotenv import load_dotenv
load_dotenv()- Check environment variables are set correctly
- Verify API keys are valid
- Check network connectivity
- Review logs for error messages
- System will fall back to local processing automatically
- Ensure
ARCHIVE_RAG_REMOTE_EMBEDDINGS=trueis set - Ensure
ARCHIVE_RAG_REMOTE_LLM=trueis set - Check that API URLs are correct
- Verify API keys are working
unset ARCHIVE_RAG_PROCESSING_MODE
unset ARCHIVE_RAG_REMOTE_EMBEDDINGS
unset ARCHIVE_RAG_REMOTE_LLMOr set explicitly:
export ARCHIVE_RAG_PROCESSING_MODE=local- Remote services are lazy-loaded only when explicitly enabled
- Automatic fallback to local processing if remote fails
- All existing code continues to work without changes
- Constitution compliance maintained by default (local-first)