DeepSeekRAG

Intro:

DeepSeekRAG is a retrieval-augmented generation (RAG) application that integrates hybrid retrieval(semantic and keyword search) with the HuggingFace transformers for accurate document question-answering. The application will be tested on the popular DeepSeek-V3 technical report.

Application Workflow Overview :

Provide a high-level overview of how the system processes uploads and queries.
Focuses on key stages: file ingestion, storage, retrieval, generation, and caching.

Application Workflow Details:

Breaks down components and technologies used at each stage.
Shows specific libraries (PyPDF2, BeautifulSoup, Whoosh, ChromaDB, Hugging Face).
Includes error handling, ranking, and streaming logic.

Python-based Tech Stack Choices and Why?

Web Framework: FastAPI
- Why? it's lightweight, asynchronous (i.e. streaming), and has excellent support for RESTful APIs.
Vector Database: Chroma
- Why? It's simple, open-source, and optimized for storing embeddings with semantic search.
Embedding Model: Sentence Transformers (e.g., all-MiniLM-L6-v2)
- Why? It’s fast, produces high-quality embeddings for semantic search, and balances accuracy with resource usage.
Keyword Search: Whoosh
- Why? it's lightweight and effective for keyword-based retrieval.
Language Model: Hugging Face Transformers (e.g.mistralai/Mixtral-8x7B-Instruct-v0.1 via Hugging Face API) . HF Token Needed!
- Why? it's open-source with high-quality generation.
PDF/HTML Parsing: PyPDF2 (for PDFs) and BeautifulSoup (for HTML)
- Why? it's reliable for text extraction.
Streaming: FastAPI’s StreamingResponse
- Why? It has built-in support for asynchronous streaming.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Demo		Demo
app		app
data		data
.DS_Store		.DS_Store
Dockerfile		Dockerfile
README.md		README.md
application_workflow_details.png		application_workflow_details.png
application_workflow_overview.png		application_workflow_overview.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepSeekRAG

Intro:

Application Workflow Overview :

Application Workflow Details:

Python-based Tech Stack Choices and Why?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

ZTECH10/DeepSeekRAG

Folders and files

Latest commit

History

Repository files navigation

DeepSeekRAG

Intro:

Application Workflow Overview :

Application Workflow Details:

Python-based Tech Stack Choices and Why?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages