Skip to content

im-anishraj/Hybrid-Search-RAG-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

193 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


🧠 Hybrid-Search RAG Engine

Production-grade Retrieval-Augmented Generation API for long-document QA.


Combines FAISS dense search and BM25 keyword search fused with Reciprocal Rank Fusion, powered by GPT-4o.


PythonΒ  FastAPIΒ  DockerΒ  MITΒ  GSSoC 2026



Architecture · Tech Stack · Quickstart · API · Benchmarks




πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        INGESTION PIPELINE                               β”‚
β”‚                                                                         β”‚
β”‚  PDF/DOCX/TXT ──► loader.py ──► chunker.py ──► embedder.py             β”‚
β”‚                   (PageRecord)  (SemanticChunker  (OpenAI               β”‚
β”‚                                  95th-pct split)   text-emb-3-small)    β”‚
β”‚                                       β”‚                                 β”‚
β”‚                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
β”‚                          β–Ό                         β–Ό                   β”‚
β”‚                   vector_store.py           bm25_store.py              β”‚
β”‚                   (FAISS IndexFlatIP)        (BM25Okapi)               β”‚
β”‚                   data/vector_store/         data/bm25_store/          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         QUERY PIPELINE                                  β”‚
β”‚                                                                         β”‚
β”‚  Question ──► get_query_embedding()                                     β”‚
β”‚                    β”‚ (one API call)                                      β”‚
β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                         β”‚
β”‚         β–Ό                     β–Ό                                         β”‚
β”‚   VectorStore             BM25Store                                     β”‚
β”‚   .search(k=10)           .search(k=10)                                 β”‚
β”‚   cosine ANN              token TF-IDF                                  β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                         β”‚
β”‚                    β–Ό                                                     β”‚
β”‚          reciprocal_rank_fusion()                                        β”‚
β”‚          score(d) = Ξ£ 1/(rank_i(d) + 60)                               β”‚
β”‚                    β”‚                                                     β”‚
β”‚                    β–Ό                                                     β”‚
β”‚            top-5 fused chunks                                            β”‚
β”‚                    β”‚                                                     β”‚
β”‚                    β–Ό                                                     β”‚
β”‚           RAGGenerator.generate()                                        β”‚
β”‚           GPT-4o Β· temp=0 Β· strict citation prompt                     β”‚
β”‚                    β”‚                                                     β”‚
β”‚                    β–Ό                                                     β”‚
β”‚   {"answer": "...", "sources": [...], "confidence_score": 0.94}        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜



πŸ›  Tech Stack

Layer Library Version
EmbeddingOpenAI text-embedding-3-smallopenai 1.75.0
GenerationGPT-4o (temp=0)langchain-openai 1.1.11
Dense indexFAISS IndexFlatIP / IVFFlatfaiss-cpu 1.9.0
Sparse indexBM25Okapi (k1=1.5, b=0.75)rank-bm25 0.2.2
ChunkingSemanticChunker (95th-pct)langchain-experimental 0.3.4
ChainLCEL RunnableSequencelangchain-core 1.2.18
APIFastAPI + uvicorn0.115.14 / 0.34.3
ValidationPydantic v22.11.4



πŸš€ Quickstart

1. Clone and install

git clone https://github.com/im-anishraj/Hybrid-Search-RAG-Engine.git
cd Hybrid-Search-RAG-Engine
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

2. Configure

cp .env.example .env
# Edit .env and set OPENAI_API_KEY=sk-...

3. Run

uvicorn app.main:app --reload --port 8000

OpenAPI docs are available at http://localhost:8000/docs.

4. Docker (Recommended for production)

cp .env.example .env        # fill in OPENAI_API_KEY
docker compose up --build -d
docker compose logs -f



🌐 API Endpoints

POST /ingest

Upload a PDF, DOCX, or TXT file. Additive β€” each call accumulates into the same corpus without replacing prior documents.

curl -X POST http://localhost:8000/ingest \
  -F "file=@annual_report.pdf"
View Response
{
  "doc_id": "annual_report.pdf",
  "filename": "annual_report.pdf",
  "pages_loaded": 47,
  "chunks_indexed": 112,
  "message": "'annual_report.pdf' ingested successfully. 47 pages β†’ 112 chunks indexed."
}

POST /query

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What was the EBITDA margin in Q3 2023?"}'
View Response
{
  "question": "What was the EBITDA margin in Q3 2023?",
  "answer": "The EBITDA margin in Q3 2023 was 18.5%, driven by operational efficiency improvements [Source: annual_report.pdf, page 15].",
  "sources": [
    {"filename": "annual_report.pdf", "page_num": 15, "chunk_id": "annual_report.pdf::p15::c2"}
  ],
  "confidence_score": 0.94,
  "can_answer": true,
  "model": "gpt-4o",
  "retrieved_chunks": []
}



πŸ† Benchmarks

Hit Rate@5 on a 20-query synthetic corpus (30 chunks, 5 topics)

Engine Hit Rate @ 5 Accuracy
Hybrid (RRF) 19 / 20 95%
BM25 (Sparse only) 18 / 20 90%
FAISS (Dense only) 17 / 20 85%

The hybrid-exclusive hit demonstrates the RRF value: FAISS finds a chunk via semantic similarity while BM25 misses it due to zero lexical overlap. RRF promotes chunks both retrievers agree on.

Run the benchmark locally:

pytest tests/test_retrieval.py -v -s



🧠 Design Notes

  • Why --workers 1? FAISS's C++ index is not fork-safe. Scale horizontally with multiple containers behind a load balancer instead of multiple workers per container.
  • Why semantic chunking? Fixed 512-token windows slice sentences mid-thought. SemanticChunker detects topic-shift boundaries via cosine distance spikes, producing one-complete-idea chunks. The LLM receives coherent passages, not sentence fragments.
  • Why BM25 alongside FAISS? Dense embeddings smear rare tokens. "EBITDA" and "CRISPR-Cas9" map to broad semantic regions shared by adjacent-but-wrong terms. BM25's exact token matching catches them precisely.




Built with precision. Open for contributions.


Stars  Forks


Licensed under MIT Β· Maintained by @im-anishraj

About

Reference hybrid-search RAG API combining FAISS dense retrieval, BM25 sparse retrieval, and reciprocal-rank fusion.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors