🧠 Hybrid-Search RAG Engine

Production-grade Retrieval-Augmented Generation API for long-document QA.

Combines FAISS dense search and BM25 keyword search fused with Reciprocal Rank Fusion, powered by GPT-4o.

Architecture · Tech Stack · Quickstart · API · Benchmarks

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                        INGESTION PIPELINE                               │
│                                                                         │
│  PDF/DOCX/TXT ──► loader.py ──► chunker.py ──► embedder.py             │
│                   (PageRecord)  (SemanticChunker  (OpenAI               │
│                                  95th-pct split)   text-emb-3-small)    │
│                                       │                                 │
│                          ┌────────────┴────────────┐                   │
│                          ▼                         ▼                   │
│                   vector_store.py           bm25_store.py              │
│                   (FAISS IndexFlatIP)        (BM25Okapi)               │
│                   data/vector_store/         data/bm25_store/          │
└─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐
│                         QUERY PIPELINE                                  │
│                                                                         │
│  Question ──► get_query_embedding()                                     │
│                    │ (one API call)                                      │
│         ┌──────────┴──────────┐                                         │
│         ▼                     ▼                                         │
│   VectorStore             BM25Store                                     │
│   .search(k=10)           .search(k=10)                                 │
│   cosine ANN              token TF-IDF                                  │
│         └──────────┬──────────┘                                         │
│                    ▼                                                     │
│          reciprocal_rank_fusion()                                        │
│          score(d) = Σ 1/(rank_i(d) + 60)                               │
│                    │                                                     │
│                    ▼                                                     │
│            top-5 fused chunks                                            │
│                    │                                                     │
│                    ▼                                                     │
│           RAGGenerator.generate()                                        │
│           GPT-4o · temp=0 · strict citation prompt                     │
│                    │                                                     │
│                    ▼                                                     │
│   {"answer": "...", "sources": [...], "confidence_score": 0.94}        │
└─────────────────────────────────────────────────────────────────────────┘

🛠 Tech Stack

Layer	Library	Version
Embedding	OpenAI text-embedding-3-small	`openai 1.75.0`
Generation	GPT-4o (temp=0)	`langchain-openai 1.1.11`
Dense index	FAISS IndexFlatIP / IVFFlat	`faiss-cpu 1.9.0`
Sparse index	BM25Okapi (k1=1.5, b=0.75)	`rank-bm25 0.2.2`
Chunking	SemanticChunker (95th-pct)	`langchain-experimental 0.3.4`
Chain	LCEL RunnableSequence	`langchain-core 1.2.18`
API	FastAPI + uvicorn	`0.115.14` / `0.34.3`
Validation	Pydantic v2	`2.11.4`

🚀 Quickstart

1. Clone and install

git clone https://github.com/im-anishraj/Hybrid-Search-RAG-Engine.git
cd Hybrid-Search-RAG-Engine
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

2. Configure

cp .env.example .env
# Edit .env and set OPENAI_API_KEY=sk-...

3. Run

uvicorn app.main:app --reload --port 8000

OpenAPI docs are available at http://localhost:8000/docs.

4. Docker (Recommended for production)

cp .env.example .env        # fill in OPENAI_API_KEY
docker compose up --build -d
docker compose logs -f

🌐 API Endpoints

`POST /ingest`

Upload a PDF, DOCX, or TXT file. Additive — each call accumulates into the same corpus without replacing prior documents.

curl -X POST http://localhost:8000/ingest \
  -F "file=@annual_report.pdf"

View Response

{
  "doc_id": "annual_report.pdf",
  "filename": "annual_report.pdf",
  "pages_loaded": 47,
  "chunks_indexed": 112,
  "message": "'annual_report.pdf' ingested successfully. 47 pages → 112 chunks indexed."
}

`POST /query`

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What was the EBITDA margin in Q3 2023?"}'

View Response

{
  "question": "What was the EBITDA margin in Q3 2023?",
  "answer": "The EBITDA margin in Q3 2023 was 18.5%, driven by operational efficiency improvements [Source: annual_report.pdf, page 15].",
  "sources": [
    {"filename": "annual_report.pdf", "page_num": 15, "chunk_id": "annual_report.pdf::p15::c2"}
  ],
  "confidence_score": 0.94,
  "can_answer": true,
  "model": "gpt-4o",
  "retrieved_chunks": []
}

🏆 Benchmarks

Hit Rate@5 on a 20-query synthetic corpus (30 chunks, 5 topics)

Engine	Hit Rate @ 5	Accuracy
Hybrid (RRF)	19 / 20	95%
BM25 (Sparse only)	18 / 20	90%
FAISS (Dense only)	17 / 20	85%

The hybrid-exclusive hit demonstrates the RRF value: FAISS finds a chunk via semantic similarity while BM25 misses it due to zero lexical overlap. RRF promotes chunks both retrievers agree on.

Run the benchmark locally:

pytest tests/test_retrieval.py -v -s

🧠 Design Notes

Why --workers 1? FAISS's C++ index is not fork-safe. Scale horizontally with multiple containers behind a load balancer instead of multiple workers per container.
Why semantic chunking? Fixed 512-token windows slice sentences mid-thought. SemanticChunker detects topic-shift boundaries via cosine distance spikes, producing one-complete-idea chunks. The LLM receives coherent passages, not sentence fragments.
Why BM25 alongside FAISS? Dense embeddings smear rare tokens. "EBITDA" and "CRISPR-Cas9" map to broad semantic regions shared by adjacent-but-wrong terms. BM25's exact token matching catches them precisely.

Built with precision. Open for contributions.

_{Licensed under MIT · Maintained by @im-anishraj}

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.github		.github
app		app
tests		tests
.env		.env
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Hybrid-Search RAG Engine

🏗️ Architecture

🛠 Tech Stack

🚀 Quickstart

1. Clone and install

2. Configure

3. Run

4. Docker (Recommended for production)

🌐 API Endpoints

`POST /ingest`

`POST /query`

🏆 Benchmarks

🧠 Design Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Hybrid-Search RAG Engine

🏗️ Architecture

🛠 Tech Stack

🚀 Quickstart

1. Clone and install

2. Configure

3. Run

4. Docker (Recommended for production)

🌐 API Endpoints

POST /ingest

POST /query

🏆 Benchmarks

🧠 Design Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /ingest`

`POST /query`

Packages