This AI-Powered Knowledge Base Chatbot revolutionizes how organizations interact with their document repositories. Built with cutting-edge AI technologies, it enables natural language conversations with your PDF and text documents, providing instant, contextual answers backed by source citations.
Perfect for:
- π Data Teams - Query research papers, documentation, and reports
- π’ Enterprise Knowledge Management - Centralized document intelligence
- π― Business Intelligence - Extract insights from company documents
- π Research & Development - Accelerate information discovery
- Multi-format Support: PDF and TXT document ingestion
- Smart Chunking: Optimized text segmentation for better retrieval
- Vector Embeddings: State-of-the-art sentence transformers for semantic search
- Natural Language Queries: Ask questions in plain English
- Context-Aware Responses: Maintains conversation history for coherent interactions
- Source Attribution: Every answer includes relevant document citations
- Real-time Processing: Instant responses with visual loading indicators
- Scalable Vector Database: ChromaDB for efficient similarity search
- Advanced LLM Integration: Powered by Anthropic's Claude 3.5 Haiku
- Memory Management: Conversation buffer for context retention
- Modular Design: Clean separation of concerns for maintainability
- Streamlit Interface: Modern, responsive web application
- Drag-and-Drop Upload: Effortless document management
- Chat History: Persistent conversation threads
- Source Exploration: Expandable source document previews
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Streamlit UI βββββΆβ Document Loader βββββΆβ Text Splitter β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Claude 3.5 ββββββ Retrieval Chain ββββββ Vector Store β
β Haiku β β (LangChain) β β (ChromaDB) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
- Python 3.11 or higher
- Anthropic API key
-
Clone the repository
git clone https://github.com/yourusername/kb-chatbot.git cd kb-chatbot
-
Set up Python environment
# Using uv (recommended) uv sync
-
Configure environment variables
cp .env.example .env # Edit .env and add your Anthropic API key echo "ANTHROPIC_API_KEY=your_api_key_here" >> .env
-
Launch the application
streamlit run app.py
-
Access the interface Open your browser to
http://localhost:8501
- Upload Documents: Use the sidebar to upload PDF or TXT files
- Process Documents: Click "Process Documents" to create the knowledge base
- Start Chatting: Ask questions about your documents in natural language
- Explore Sources: Click on source citations to view relevant document excerpts
"What are the key performance metrics mentioned in the Q3 report?"
"Summarize the main risks identified in our compliance documentation"
"What budget allocations were discussed for the marketing department?"
"What methodologies were used in the machine learning research papers?"
"Compare the experimental results across different studies"
"What are the limitations mentioned in the technical documentation?"
"What are the requirements for data privacy compliance?"
"Summarize the employee handbook policies on remote work"
"What security protocols are outlined in our IT documentation?"
ANTHROPIC_API_KEY=your_anthropic_api_key
- Chunk Size: Modify
chunk_size
inchatbot.py:57
- Retrieval Count: Adjust
k
parameter inchatbot.py:65
- Model Selection: Change Claude model in
chatbot.py:26
kb-chatbot/
βββ app.py # Streamlit web interface
βββ chatbot.py # Core chatbot logic
βββ pyproject.toml # Project dependencies
βββ .env.example # Environment template
KnowledgeBaseChatbot
: Main chatbot class handling document processing and Q&A- Document Loaders: PDF and text file processing utilities
- Vector Store: ChromaDB integration for semantic search
- Conversation Chain: LangChain orchestration for retrieval-augmented generation
streamlit run app.py
docker build -t kb-chatbot .
docker run -p 8501:8501 kb-chatbot
Still underconstruction
Compatible with:
- Streamlit Cloud
- AWS EC2/ECS
- Google Cloud Run
- Azure Container Instances
Contributions are welcomed!
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Anthropic for Claude 3.5 Haiku LLM
- LangChain for the RAG framework
- LangGraph for the RAG framework
- Streamlit for the web interface
- ChromaDB for vector storage
- Hugging Face for embedding models
- π§ Discord: andreaschandra#4851
- π¬ Issues: GitHub Issues
- π Documentation: Wiki
Built with β€οΈ for intelligent document interaction
β Star this repository if you find it helpful!