WebChat is a chat bot that extracts and cleans the content of a website, allowing you to interact with it through a conversational interface.
- Ensemble retriever: combines keyword and semantic search to retrieve the source documents.
- Maximum Marginal Relevance: re-ranks source documents maximizing the diversity and reducing redundancy.
- Reorder: places the most relevant documents earlier in the context to avoid the lost-in-the-middle problem.
- Set the following environment variables in an .env file in project directory:
OPENAI_API_KEY
WEBSITE_NAME
WEBSITE_URL
WEBSITE_DESCRIPTION
- Create a virtual environment and install dependencies:
pip install -r requirements.txt
- Ingest the website content:
python src/ingest
- Run the streamlit web app:
streamlit run src/app.py
-
LangChain Documentation: https://docs.langchain.com/
-
Streamlit Documentation: https://docs.streamlit.io/
-
Platzi LangChain Course for Document Management and Retrieval: https://platzi.com/cursos/langchain-documents/
