DocuMentor - AI-Powered Study Assistant

🎯 Project Overview

DocuMentor is an offline AI tutor that helps students learn from their study materials by providing intelligent summarization, quiz generation, and interactive Q&A capabilities. Built entirely with open-source models and tools, it runs locally without requiring any API keys or cloud services.

🏗️ Architecture

Tech Stack

Frontend: React/Next.js (existing mock UI)
Backend: FastAPI (Python)
ML Models:
- microsoft/Phi-3-mini-4k-instruct (3B params) - Text summarization
- google/flan-t5-xl - MCQ generation
- bge-small-en or e5-small - Text embeddings
Vector Database: FAISS
PDF Processing: pdfplumber / PyMuPDF
Environment: Conda (environment name: documentor)
GPU: RTX 4060 (CUDA support)

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Frontend (React)                        │
│  - File Upload UI                                           │
│  - Chat Interface                                           │
│  - Summary Display                                          │
│  - Quiz Generator (future)                                  │
└───────────────────────────┬─────────────────────────────────┘
                            │ HTTP/REST API
┌───────────────────────────▼─────────────────────────────────┐
│                    Backend (FastAPI)                         │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  API Endpoints                                       │   │
│  │  - POST /upload_pdf                                 │   │
│  │  - POST /summarize                                  │   │
│  │  - POST /ask_question                               │   │
│  │  - POST /generate_quiz (future)                     │   │
│  └─────────────────────────────────────────────────────┘   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Processing Pipeline                                 │   │
│  │  1. PDF Ingestion → Text Extraction                │   │
│  │  2. Chunking (150-300 words with overlap)          │   │
│  │  3. Embedding Generation                            │   │
│  │  4. FAISS Indexing                                  │   │
│  └─────────────────────────────────────────────────────┘   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  ML Models (Local GPU)                              │   │
│  │  - Phi-3 (Summarization)                           │   │
│  │  - Flan-T5-XL (MCQ Generation)                     │   │
│  │  - BGE/E5 (Embeddings)                             │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

📁 Project Structure

DocuMentor/
├── website/
│   ├── client/              # React frontend
│   │   ├── components/
│   │   │   ├── app/         # Main app components
│   │   │   └── ui/          # UI components
│   │   ├── app/             # Next.js app directory
│   │   └── ...
│   └── server/              # (if needed for Next.js SSR)
├── backend/
│   ├── main.py              # FastAPI app entry point
│   ├── models/
│   │   ├── phi3_summarizer.py      # Phi-3 model wrapper
│   │   ├── t5_quiz_generator.py    # Flan-T5 wrapper
│   │   └── embeddings.py           # Embedding model
│   ├── services/
│   │   ├── pdf_processor.py        # PDF reading & chunking
│   │   ├── vector_store.py         # FAISS operations
│   │   ├── rag_pipeline.py         # RAG for Q&A
│   │   └── summarizer.py           # Summarization logic
│   ├── api/
│   │   ├── routes.py               # API endpoints
│   │   └── schemas.py              # Pydantic models
│   ├── utils/
│   │   ├── chunker.py              # Text chunking utilities
│   │   └── config.py               # Configuration
│   ├── requirements.txt
│   └── environment.yml
├── data/                    # Uploaded PDFs & processed data
│   ├── uploads/
│   ├── vectors/            # FAISS indices
│   └── processed/          # Chunked documents
├── models/                  # Downloaded model weights (cached)
└── claude.md               # This file

🚀 3-Month Development Roadmap

📅 MONTH 1: Foundation – Build the Brain

Goal: Working pipeline from PDF → Q&A using local LLM

Week 1: PDF Ingestion + Chunking ✅

Read PDFs using pdfplumber / PyMuPDF
Extract titles, headings, bullets (basic structure detection)
Split into clean 150–300 word chunks (with slight overlap)
Store as JSON / pickled dict
Checkpoint: Upload any textbook and see a clean list of all chunks

Week 2: Embeddings + Vector Index (FAISS)

Load open-source embedding model: bge-small-en, e5-small, or Instructor
Embed all chunks
Store embeddings in FAISS
Write a retriever: query → embedding → top-k chunks
Checkpoint: Enter "What is overfitting?" → get relevant chunks from doc

Week 3–4: Local LLM + RAG Pipeline

Load Phi-3 with transformers (4-bit quantization for efficiency)
Pass top-k chunks + query into prompt
Get generated answer
Create FastAPI endpoints for Q&A
Checkpoint: Local chatbot answers based only on uploaded study material

📅 MONTH 2: Intelligence – Summary & Quizzes

Goal: Add summarization and question generation

Week 5–6: Summarizer

Chunk-level summaries (2–3 sentences per section)
Full-doc summary
Prompt tuning: "bullet points", "exam-style", "in your own words"
Connect to frontend summary display
Checkpoint: App shows a quick summary of any chapter or section

Week 7–8: Quiz Generator

Integrate Flan-T5-XL for MCQ generation
Prompt LLM to generate 3–5 MCQs per section
Generate open-ended questions
Add answer options + explanations
Save to JSON for frontend consumption
Checkpoint: Auto-generated quiz for each topic

Week 9: Flashcard / Revision View

Turn Q&A pairs into flashcards
Optionally use Anki deck format
Add toggle: "Mark as Learned / Unseen"
Frontend component for flashcard display
Checkpoint: Revision tool built from your own notes

📅 MONTH 3: User Interface + Extensions

Goal: Make it usable, beautiful, and extensible

Week 10–11: Polish UI

Connect all frontend components to backend APIs
File upload with progress indicator
Chat interface for Q&A
Tabs for: Summary, Quiz, Flashcards
Option to download generated content
Checkpoint: Working web UI that looks good and runs locally

Week 12: Extensions / Final Touch

Multi-document support
Search by topic, not just by text
Save/load sessions
Export quizzes and summaries (PDF/Markdown)
Performance optimization
Final Checkpoint: Polished, useful AI tutor built from scratch

🔧 Current Phase: Month 1, Week 1 - Backend Setup

Immediate Tasks (Current Session)

✅ Create project documentation
⏳ Set up backend directory structure
⏳ Configure conda environment with dependencies
⏳ Implement PDF processor module
⏳ Create FastAPI skeleton with basic endpoints
⏳ Test PDF upload and chunking workflow

📦 Dependencies

Backend Requirements

# Core Framework
fastapi==0.109.0
uvicorn[standard]==0.27.0
python-multipart==0.0.6

# ML & NLP
torch>=2.1.0
transformers>=4.36.0
sentence-transformers>=2.3.1
accelerate>=0.25.0
bitsandbytes>=0.41.0  # For 4-bit quantization

# Vector Store
faiss-cpu==1.7.4  # or faiss-gpu for GPU support

# PDF Processing
pdfplumber>=0.10.3
PyMuPDF>=1.23.8

# Utilities
numpy>=1.24.0
pandas>=2.1.0
pydantic>=2.5.0
python-dotenv>=1.0.0

Frontend Dependencies (Already set up)

React
Next.js
TailwindCSS
TypeScript

🎯 Core ML/NLP Concepts Implemented

Embedding Generation - Convert text chunks to dense vectors
Vector Similarity Search - Retrieve relevant chunks using FAISS
Retrieval-Augmented Generation (RAG) - Combine retrieval with generation
Prompt Engineering - Craft effective prompts for summarization and quizzes
Local LLM Inference - Run models efficiently on consumer GPU
Smart Chunking - Segment documents intelligently with context preservation
Quantization - Use 4-bit models to reduce memory usage

🔐 Security & Privacy

100% Offline: All processing happens locally
No Data Leakage: Documents never leave your machine
No API Keys: Zero dependency on external services
Private Learning: Your study materials remain completely private

🚦 Getting Started

Prerequisites

Python 3.10+
CUDA-capable GPU (RTX 4060 in this case)
Conda installed
Node.js 18+ (for frontend)

Quick Start

# 1. Activate conda environment
conda activate documentor

# 2. Install backend dependencies
cd backend
pip install -r requirements.txt

# 3. Start FastAPI server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

# 4. In another terminal, start frontend (already set up)
cd website/client
npm run dev

Environment Setup (To be created)

# Create conda environment
conda create -n documentor python=3.10
conda activate documentor

# Install PyTorch with CUDA support
conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia

# Install remaining dependencies
pip install -r requirements.txt

📊 Model Details

Phi-3-mini-4k-instruct

Size: 3.8B parameters
Context: 4K tokens
Use Case: Text summarization, Q&A
Quantization: 4-bit (fits in ~3GB VRAM)
License: MIT

Flan-T5-XL

Size: 3B parameters
Use Case: MCQ generation, instruction following
Quantization: 4-bit recommended
License: Apache 2.0

BGE-Small-EN

Size: 33M parameters
Use Case: Text embeddings for retrieval
Embedding Dim: 384
License: MIT

🎨 API Endpoints (To be implemented)

PDF Management

POST /api/upload - Upload PDF file
GET /api/documents - List uploaded documents
DELETE /api/documents/{doc_id} - Delete document

Summarization

POST /api/summarize - Summarize document or section
- Body: { "doc_id": "...", "mode": "full|chunked" }

Q&A

POST /api/ask - Ask question about document
- Body: { "doc_id": "...", "question": "..." }

Quiz Generation (Future)

POST /api/generate_quiz - Generate MCQs
- Body: { "doc_id": "...", "num_questions": 5 }

🧪 Testing Strategy

Unit Tests: Test each module independently
Integration Tests: Test PDF → Chunks → Embeddings → Retrieval
End-to-End Tests: Test full user workflows
Model Quality Tests: Evaluate summarization and MCQ quality

📈 Performance Considerations

Model Loading: Cache models in memory to avoid reloading
Batch Processing: Process multiple chunks together
Quantization: Use 4-bit models to reduce VRAM usage
FAISS: Use GPU-accelerated FAISS if available
Async Processing: Use FastAPI's async capabilities for I/O operations

🔄 Future Enhancements

📝 Notes

All models run locally on RTX 4060 GPU
Conda environment name: documentor
Frontend is already built as a mock - needs backend connection
Start with PDF summarization, then add MCQ generation later
Chunked summarization approach for large documents

🤝 Contributing

This is a personal learning project. Feel free to use this architecture for your own implementations!

📄 License

Open source - Educational purposes

Last Updated: 2025-11-06 Status: Phase 1 - Backend Development in Progress

FilesExpand file tree

Report.md

Latest commit

History