DocuBrain AI Assistant

DocuBrain AI Assistant is an interactive Streamlit-based application that offers powerful AI-driven text processing, document analysis, and conversational capabilities.

Access the Application

Visit the application online at: https://docubrain-ai-8fqgp4pgq3mwrysacuh4m4.streamlit.app/

Demo Video

Here is the demo video: link.

Features

1. Text Processing

Summarization: Generate concise summaries of articles or text.
Highlights: Extract key highlights from input text.
Points of Minutes (PoM): Create structured PoM from provided text.
Custom Instructions: Perform tasks based on user-defined instructions.

2. InsightDoc AI Analyzer( Content Engine for Document Comparison and Insights)

This repository implements a Content Engine utilizing Retrieval Augmented Generation (RAG) techniques to analyze and compare multiple PDF documents, specifically Form 10-K filings from multinational companies.

Features

PDF Parsing: Extracts and processes text from PDF documents.
Vectorization: Converts text content into vectors using local embedding models.
Vector Store Integration: Embeddings are stored in a vector store (Chroma or FAISS) for efficient retrieval and comparison.
Local LLM Integration: A local Large Language Model (LLM), Mistral-7B-Instruct, is used for generating insights and answering queries.
Interactive Chatbot Interface: Built with Streamlit, providing users a platform to query the system and obtain document insights.

Workflow

Parse Documents: Extract text from PDF files (e.g., Alphabet, Tesla, and Uber Form 10-K filings).
Generate Embeddings: Use Sentence-Transformers to generate dense vector embeddings for document text.
Store in Vector Store: Save embeddings in a vector store like Chroma or FAISS for fast querying.
Query Engine: Retrieve relevant information and generate insights using a local LLM.
Chatbot Interface: Users interact via a Streamlit-based chatbot UI to query and explore insights from documents.

Sample Queries

"How does Tesla's automotive segment differ from its energy generation and storage segment?"
"What are the differences in the business of Tesla and Uber?"
"What is the total revenue for Google Search?"

3. Chat with Assistant

Ask questions or chat with a powerful AI assistant.
Leverage state-of-the-art models for personalized responses.

Installation

Prerequisites

Ensure the following are installed on your system:

Python 3.8 or higher
pip
Streamlit

Setup Steps

Clone this repository:

git clone https://github.com/your-username/docubrain-ai.git
cd docubrain-ai

Install required dependencies:
```
pip install -r requirements.txt
```

Create a .env file in the project root and add your API keys:

HUGGINGFACE_API_KEY=your_huggingface_api_key
GROQ_API_KEY=your_groq_api_key

Run the application:
```
streamlit run app.py
```

Usage

Part 0: Introduction

The home screen provides an overview of the app’s functionalities and links to sample files for testing.

Part 1: Text Processing

Navigate to the Text Processing tab.
Choose a task: Summarization, Highlights, PoM, or Custom Instructions.
Input your text or article and click "Process" to get results.

Part 2: InsightDoc AI Analyzer

Navigate to the InsightDoc AI Analyzer tab.
Upload one or more PDF files for analysis.
Perform tasks such as document comparison, summarization, or specific searches.

Part 3: Chat Window

Navigate to the Chat Window tab.
Input your message in the chat box and receive intelligent responses in real-time.

File Structure

.
├── app.py                 # Main application file
├── requirements.txt       # Required dependencies
├── .env                   # Environment variables
├── README.md              # Project documentation
├── sample_files/          # Sample files for testing
└── ...                    # Additional resources and files

Dependencies

Streamlit: For creating the web interface.
LangChain: For conversational chains and document retrieval.
FAISS: For vector storage and similarity search.
Hugging Face Transformers: For LLM and embeddings.
Groq: For specific AI-powered text processing tasks.
PyPDFLoader: For PDF file loading and parsing.

API Keys

Hugging Face: Used for LLM-based tasks.
Groq: Used for text processing tasks.

Ensure you have valid API keys for both services. Add these keys to the .env file as shown in the setup instructions.

Sample Files

Sample PDF files are included in the sample_files/ directory for testing purposes. Use these to explore the app’s functionalities.

Contributing

Contributions are welcome! If you find a bug or want to add a feature:

Fork the repository.
Create a new branch: git checkout -b feature-name.
Commit your changes: git commit -m 'Add some feature'.
Push to the branch: git push origin feature-name.
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contact

For any questions or feedback, please reach out to [[email protected]].

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
README.md		README.md
app.py		app.py
essay.pdf		essay.pdf
google.pdf		google.pdf
requirements.txt		requirements.txt
sample_question.pdf		sample_question.pdf
tesla.pdf		tesla.pdf
uber.pdf		uber.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocuBrain AI Assistant

Access the Application

Demo Video

Features

1. Text Processing

2. InsightDoc AI Analyzer( Content Engine for Document Comparison and Insights)

This repository implements a Content Engine utilizing Retrieval Augmented Generation (RAG) techniques to analyze and compare multiple PDF documents, specifically Form 10-K filings from multinational companies.

Features

Workflow

Sample Queries

3. Chat with Assistant

Installation

Prerequisites

Setup Steps

Usage

Part 0: Introduction

Part 1: Text Processing

Part 2: InsightDoc AI Analyzer

Part 3: Chat Window

File Structure

Dependencies

API Keys

Sample Files

Contributing

License

Contact

About

Packages

Languages

Rahulaggl/DocuBrain-AI

Folders and files

Latest commit

History

Repository files navigation

DocuBrain AI Assistant

Access the Application

Demo Video

Features

1. Text Processing

2. InsightDoc AI Analyzer( Content Engine for Document Comparison and Insights)

This repository implements a Content Engine utilizing Retrieval Augmented Generation (RAG) techniques to analyze and compare multiple PDF documents, specifically Form 10-K filings from multinational companies.

Features

Workflow

Sample Queries

3. Chat with Assistant

Installation

Prerequisites

Setup Steps

Usage

Part 0: Introduction

Part 1: Text Processing

Part 2: InsightDoc AI Analyzer

Part 3: Chat Window

File Structure

Dependencies

API Keys

Sample Files

Contributing

License

Contact

About

Topics

Resources

Stars

Watchers

Forks

Packages 0

Languages

Packages