DocuMentor is an offline AI tutor that helps students learn from their study materials by providing intelligent summarization, quiz generation, and interactive Q&A capabilities. Built entirely with open-source models and tools, it runs locally without requiring any API keys or cloud services.
- Frontend: React/Next.js (existing mock UI)
- Backend: FastAPI (Python)
- ML Models:
microsoft/Phi-3-mini-4k-instruct(3B params) - Text summarizationgoogle/flan-t5-xl- MCQ generationbge-small-enore5-small- Text embeddings
- Vector Database: FAISS
- PDF Processing: pdfplumber / PyMuPDF
- Environment: Conda (environment name:
documentor) - GPU: RTX 4060 (CUDA support)
┌─────────────────────────────────────────────────────────────┐
│ Frontend (React) │
│ - File Upload UI │
│ - Chat Interface │
│ - Summary Display │
│ - Quiz Generator (future) │
└───────────────────────────┬─────────────────────────────────┘
│ HTTP/REST API
┌───────────────────────────▼─────────────────────────────────┐
│ Backend (FastAPI) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ API Endpoints │ │
│ │ - POST /upload_pdf │ │
│ │ - POST /summarize │ │
│ │ - POST /ask_question │ │
│ │ - POST /generate_quiz (future) │ │
│ └─────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Processing Pipeline │ │
│ │ 1. PDF Ingestion → Text Extraction │ │
│ │ 2. Chunking (150-300 words with overlap) │ │
│ │ 3. Embedding Generation │ │
│ │ 4. FAISS Indexing │ │
│ └─────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ ML Models (Local GPU) │ │
│ │ - Phi-3 (Summarization) │ │
│ │ - Flan-T5-XL (MCQ Generation) │ │
│ │ - BGE/E5 (Embeddings) │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
DocuMentor/
├── website/
│ ├── client/ # React frontend
│ │ ├── components/
│ │ │ ├── app/ # Main app components
│ │ │ └── ui/ # UI components
│ │ ├── app/ # Next.js app directory
│ │ └── ...
│ └── server/ # (if needed for Next.js SSR)
├── backend/
│ ├── main.py # FastAPI app entry point
│ ├── models/
│ │ ├── phi3_summarizer.py # Phi-3 model wrapper
│ │ ├── t5_quiz_generator.py # Flan-T5 wrapper
│ │ └── embeddings.py # Embedding model
│ ├── services/
│ │ ├── pdf_processor.py # PDF reading & chunking
│ │ ├── vector_store.py # FAISS operations
│ │ ├── rag_pipeline.py # RAG for Q&A
│ │ └── summarizer.py # Summarization logic
│ ├── api/
│ │ ├── routes.py # API endpoints
│ │ └── schemas.py # Pydantic models
│ ├── utils/
│ │ ├── chunker.py # Text chunking utilities
│ │ └── config.py # Configuration
│ ├── requirements.txt
│ └── environment.yml
├── data/ # Uploaded PDFs & processed data
│ ├── uploads/
│ ├── vectors/ # FAISS indices
│ └── processed/ # Chunked documents
├── models/ # Downloaded model weights (cached)
└── claude.md # This file
Goal: Working pipeline from PDF → Q&A using local LLM
- Read PDFs using pdfplumber / PyMuPDF
- Extract titles, headings, bullets (basic structure detection)
- Split into clean 150–300 word chunks (with slight overlap)
- Store as JSON / pickled dict
- Checkpoint: Upload any textbook and see a clean list of all chunks
- Load open-source embedding model:
bge-small-en,e5-small, orInstructor - Embed all chunks
- Store embeddings in FAISS
- Write a retriever: query → embedding → top-k chunks
- Checkpoint: Enter "What is overfitting?" → get relevant chunks from doc
- Load Phi-3 with transformers (4-bit quantization for efficiency)
- Pass top-k chunks + query into prompt
- Get generated answer
- Create FastAPI endpoints for Q&A
- Checkpoint: Local chatbot answers based only on uploaded study material
Goal: Add summarization and question generation
- Chunk-level summaries (2–3 sentences per section)
- Full-doc summary
- Prompt tuning: "bullet points", "exam-style", "in your own words"
- Connect to frontend summary display
- Checkpoint: App shows a quick summary of any chapter or section
- Integrate Flan-T5-XL for MCQ generation
- Prompt LLM to generate 3–5 MCQs per section
- Generate open-ended questions
- Add answer options + explanations
- Save to JSON for frontend consumption
- Checkpoint: Auto-generated quiz for each topic
- Turn Q&A pairs into flashcards
- Optionally use Anki deck format
- Add toggle: "Mark as Learned / Unseen"
- Frontend component for flashcard display
- Checkpoint: Revision tool built from your own notes
Goal: Make it usable, beautiful, and extensible
- Connect all frontend components to backend APIs
- File upload with progress indicator
- Chat interface for Q&A
- Tabs for: Summary, Quiz, Flashcards
- Option to download generated content
- Checkpoint: Working web UI that looks good and runs locally
- Multi-document support
- Search by topic, not just by text
- Save/load sessions
- Export quizzes and summaries (PDF/Markdown)
- Performance optimization
- Final Checkpoint: Polished, useful AI tutor built from scratch
- ✅ Create project documentation
- ⏳ Set up backend directory structure
- ⏳ Configure conda environment with dependencies
- ⏳ Implement PDF processor module
- ⏳ Create FastAPI skeleton with basic endpoints
- ⏳ Test PDF upload and chunking workflow
# Core Framework
fastapi==0.109.0
uvicorn[standard]==0.27.0
python-multipart==0.0.6
# ML & NLP
torch>=2.1.0
transformers>=4.36.0
sentence-transformers>=2.3.1
accelerate>=0.25.0
bitsandbytes>=0.41.0 # For 4-bit quantization
# Vector Store
faiss-cpu==1.7.4 # or faiss-gpu for GPU support
# PDF Processing
pdfplumber>=0.10.3
PyMuPDF>=1.23.8
# Utilities
numpy>=1.24.0
pandas>=2.1.0
pydantic>=2.5.0
python-dotenv>=1.0.0
- React
- Next.js
- TailwindCSS
- TypeScript
- Embedding Generation - Convert text chunks to dense vectors
- Vector Similarity Search - Retrieve relevant chunks using FAISS
- Retrieval-Augmented Generation (RAG) - Combine retrieval with generation
- Prompt Engineering - Craft effective prompts for summarization and quizzes
- Local LLM Inference - Run models efficiently on consumer GPU
- Smart Chunking - Segment documents intelligently with context preservation
- Quantization - Use 4-bit models to reduce memory usage
- 100% Offline: All processing happens locally
- No Data Leakage: Documents never leave your machine
- No API Keys: Zero dependency on external services
- Private Learning: Your study materials remain completely private
- Python 3.10+
- CUDA-capable GPU (RTX 4060 in this case)
- Conda installed
- Node.js 18+ (for frontend)
# 1. Activate conda environment
conda activate documentor
# 2. Install backend dependencies
cd backend
pip install -r requirements.txt
# 3. Start FastAPI server
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# 4. In another terminal, start frontend (already set up)
cd website/client
npm run dev# Create conda environment
conda create -n documentor python=3.10
conda activate documentor
# Install PyTorch with CUDA support
conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia
# Install remaining dependencies
pip install -r requirements.txt- Size: 3.8B parameters
- Context: 4K tokens
- Use Case: Text summarization, Q&A
- Quantization: 4-bit (fits in ~3GB VRAM)
- License: MIT
- Size: 3B parameters
- Use Case: MCQ generation, instruction following
- Quantization: 4-bit recommended
- License: Apache 2.0
- Size: 33M parameters
- Use Case: Text embeddings for retrieval
- Embedding Dim: 384
- License: MIT
POST /api/upload- Upload PDF fileGET /api/documents- List uploaded documentsDELETE /api/documents/{doc_id}- Delete document
POST /api/summarize- Summarize document or section- Body:
{ "doc_id": "...", "mode": "full|chunked" }
- Body:
POST /api/ask- Ask question about document- Body:
{ "doc_id": "...", "question": "..." }
- Body:
POST /api/generate_quiz- Generate MCQs- Body:
{ "doc_id": "...", "num_questions": 5 }
- Body:
- Unit Tests: Test each module independently
- Integration Tests: Test PDF → Chunks → Embeddings → Retrieval
- End-to-End Tests: Test full user workflows
- Model Quality Tests: Evaluate summarization and MCQ quality
- Model Loading: Cache models in memory to avoid reloading
- Batch Processing: Process multiple chunks together
- Quantization: Use 4-bit models to reduce VRAM usage
- FAISS: Use GPU-accelerated FAISS if available
- Async Processing: Use FastAPI's async capabilities for I/O operations
- Multi-document chat
- Knowledge graph visualization
- Spaced repetition system
- Audio lecture transcription and summarization
- Handwritten notes OCR
- Collaborative study features
- Mobile app (React Native)
- All models run locally on RTX 4060 GPU
- Conda environment name:
documentor - Frontend is already built as a mock - needs backend connection
- Start with PDF summarization, then add MCQ generation later
- Chunked summarization approach for large documents
This is a personal learning project. Feel free to use this architecture for your own implementations!
Open source - Educational purposes
Last Updated: 2025-11-06 Status: Phase 1 - Backend Development in Progress