Quanta Chatbot is a Python application designed to handle multiple files (PDF/DOC/TXT) for in-depth discussions. It specializes in working with research papers or lengthy documents, providing precise and contextually relevant response to user questions based on document content(s). Initially, it was built to aid researchers/professors associated in Department of Dermatology at the University of Michigan.
- Python as the code language.
- BERT-based embeddings (via Sentence-BERT from HuggingFace).
- GPT-4 as the response generator (via OpenAI API).
- ChromaDB for fast vector storage and retrieval.
- Streamlit for frontend deployment.
- LangChain for orchestrating query pipelines.
This project separates retrieval and generation.
Please feel free to checkout the various ReadMe's located throughout the folders for in-depth discussion. (inside
src
,src/testing
,src/testing/comparison
, andsrc/testing/add_langchain
)
Follow the steps below to set up your chatbot:
-
Install Python
If you have never used Python on your computer before, make sure to download one.
-
Install dependencies
Run the following command to install the required dependencies:
pip install -r requirements.txt
-
Install other files
Sometimes, manual installations are required. Install as necessary.
-
Set up your OpenAI API key
Obtain a personal API key from OpenAI and add it to your terminal environment, like this:
(For temporary usage, please contact the email at the very bottom.)
-
macOS / Linux:
export OPENAI_API_KEY='YOUR_OPEN_AI_KEY'
-
Windows:
set OPENAI_API_KEY='YOUR_OPEN_AI_KEY'
To use Quanta Chatbot, follow these steps:
-
Ensure all dependencies are installed and OpenAI API key is incorporated into the file.
-
Run the main.py file using the following command at QuantaBot (most oustide) directory.
streamlit run main.py
-
A page will be launched on your default web browser.
NOTICE:
- At this stage, if the page shows a red screen with a "ModuleNotFound" error, please install the ones listed in the error message manually at the terminal as well, using the following template. This tends to vary by computer. Then, repeat step 2.
pip install <dependencies you need>
- If "StreamlitDuplicateElementKey" error pops up, just refresh the page once more.
- At this stage, if the page shows a red screen with a "ModuleNotFound" error, please install the ones listed in the error message manually at the terminal as well, using the following template. This tends to vary by computer. Then, repeat step 2.
-
Upload as many files as you want by dragging or selecting.
- It is intended that files MUST BE uploaded FIRST for program to run.
-
Ask questions in English about the loaded files using the chat interface.
While running the program, various errors might occur. If so, make sure to try these temporary solutions.
-
Try pip -> pip3 / python -> python3
-
Update the problem dependencies.
pip install --upgrade <dependency being updated>
-
Ensure that your OpenAI has enough credit balance, else it won't be making successful API calls. This can be checked here.
-
Make sure to search up the error message that appears on the terminal window.
-
Hand type the terminal commands, instead of copying and pasting.
Before developing the main chatbot interface, this section shows how the internal components of the retrieval system were tested and evaluated independently. By running a series of experiments - including statistical visualizations, metric-based comparisons, and LangChain evaluations - I aim to identify the most effective configurations for semantic search and information retrieval. This includes analyzing embeddings, scoring methods like F1, NDCG, MRR, Recall@K, and evaluating the trade-offs between accuracy, speed, and memory efficiency across various approaches.
ℹ️ Note
All files and descriptions relevant to testing can be found in thesrc/testing
directory.
For further inquiries, please feel free to contact [email protected].