🤖 Chat With Your Documents: Stateful AI Chatbot

Welcome to Chat With Your Documents, an advanced stateful chatbot application designed to streamline document-based interactions. This application enables users to upload files 📁 or provide URLs 🌐 and engage in dynamic, context-aware conversations directly with their documents. Whether you're querying technical documentation, analyzing reports, or diving deep into research papers, this chatbot makes it effortless to extract insights and knowledge.

✨ Key Features

🤝 Stateful Conversations: Maintain the context of your queries for more intelligent and coherent interactions.
📄 Retrieval-Augmented Generation System: Seamlessly upload files or provide URLs to enable dynamic, context-aware conversations with your documents by leveraging retrieval-augmented generation techniques.
🚀 Google Kubernetes Deployment: Deployed seamlessly on Google Kubernetes Engine (GKE) for scalable and robust operations.
🔄 CI/CD Integration: Equipped with a Continuous Integration and Continuous Deployment (CI/CD) pipeline to ensure fast, reliable updates and maintenance. (In progress...)

This repository contains all the resources you need to deploy, customize, and use the application effectively. Dive into the sections below to get started! 🚀

🌟 Demo

Figure 1. Demo of the Chatbot Without External Knowledge

Figure 2. Demo of the Chatbot After Adding External Knowledge via the RAG System

🌟 System Overview

Figure 3. System Overview.

📄 RAG - System

Document Upload: 🗂️ Users can upload their documents (including files or URLs) to the chatbot. The content from these files or URLs is saved in MinIO for further processing.
Vectorization:
- 🔍 Semantic Chunking: Documents are split into smaller, meaningful chunks using semantic chunking techniques. Semantic chunking ensures that each segment of the document preserves contextual relevance, making it easier to retrieve and understand. For example, sections, paragraphs, or logical groupings of information are treated as distinct chunks.
- 📦 Embedding and Storage: Each chunk is converted into embeddings (vector representations) and stored in Redis Vector Database for efficient similarity-based retrieval during queries.

💬 Chat System

User Session:
- 🔗 Stateful Connection: A WebSocket connection is established between the user and the server to maintain a stateful session.
- 📜 Historical Context: At the start of the session, historical conversation data is fetched from Cassandra to provide the chatbot with contextual knowledge of past interactions. This allows the chatbot to respond more effectively by considering previous queries and answers.
- 📈 Incremental Updates: WebSockets enable real-time, incremental updates to the historical conversation data, ensuring the context is always up-to-date.
Message Processing:
- 🛠️ Standalone Question Creation: Each user message, along with its historical context, is processed to generate a standalone question using OpenAI. The standalone question is reformulated to be independent of previous interactions, improving both context retrieval and input clarity for the LLM.
- Example:
```
User: Do you know Elon Musk?
Bot: Yes, I know him.
User: Is HE the richest man in the world?
```
  Standalone Question: Is Elon Musk the richest man in the world?
📚 Context Retrieval:
- The embedding of the standalone question is used to query the Redis Vector Database, retrieving relevant chunks of external context.
🧠 Response Generation:
- The standalone question and the retrieved context are sent to the OpenAI API to generate a response.
- The user’s question and the chatbot's response are then stored in Cassandra to update the conversation history.

This system architecture ensures accurate, context-aware responses and efficient handling of document-based queries. 🚀

🚀 Getting Started

Follow these steps to set up and run the application:

Clone the repository:

git clone https://github.com/nhduong1203/LLM-Chatbot

Navigate to the project directory:
```
cd LLM-Chatbot
```
Set the root directory environment variable (for easier navigation):
```
export ROOT_DIR=$(pwd)
```

🚀 Application Deployment

Figure 4. GKE Deployment Overview.

1.1 Build Application Image

Create a new .env file based on .env.example and populate the variables there:
```
cd $ROOT_DIR
set -a && source .env && set +a
```
Build the application images:
```
docker-compose build --no-cache
```

After building the images, tag and push the image to your Docker Hub repository. For example:

docker tag backend-chat:latest $DOCKER_USERNAME/backend-chat:latest
docker push $DOCKER_USERNAME/backend-chat:latest

1.2 Deploy to GKE

1.2.1 Setup

Install Required Tools:
- kubectl - For communicating with the Kubernetes API server.
- kubectx and kubens - For easier navigation between clusters and namespaces.
- Helm - For managing templating and deployment of Kubernetes resources.

Create a GKE Cluster using Terraform:

Log in to the GCP Console and create a new project.

Update the project_id in terraform/variables.tf:

variable "project_id" {
  description = "The project ID to host the cluster in"
  default     = "your-project-id"
}

variable "region" {
  description = "The region for the cluster"
  default     = "asia-southeast1-a"
}

Log in to GCP using the gcloud CLI:
```
gcloud auth application-default login
```

Provision a new GKE cluster using Terraform:

cd $ROOT_DIR/iac/terraform
terraform init
terraform plan
terraform apply

Connect to the GKE Cluster:

gcloud container clusters get-credentials $CLUSTER_NAME --region $REGION --project $PROJECT_ID

Switch to the GKE Cluster Context:

kubectx gke_${PROJECT_ID}_${REGION}_${CLUSTER_NAME}

1.2.2 Deployment

Navigate to the k8s-yaml folder:
```
cd $ROOT_DIR/k8s-yaml
```
Deploy each service by applying its corresponding YAML file. For example:
```
cd db
kubectl apply -f minio.yaml
```
Repeat the process for all other services in their respective folders by navigating to the folder and running:
```
kubectl apply -f {service-name}.yaml
```

Create a Secret: store the OpenAI API key in a Kubernetes secret:

kubectl create secret generic openai-api-key --from-literal=OPENAI_API_KEY=<your_openai_api_key>

Note: To deploy Cassandra, first run cassandra-deployment.yaml to start Cassandra. Then, run cassandra-init-job.yaml to initialize the keyspace, tables, and other required configurations.

📄 Instances in My GKE Cluster

Due to the limitations of GCP's free trial tier, I am unable to use instances with GPUs. As a result, certain machine learning models in this cluster will run on CPU. Below is the configuration of the instances in my cluster:

Resource	Name	Machine Type	Disk Size (GB)	Preemptible	Labels	Min Nodes	Max Nodes	Node Count	Workload
GKE Cluster	`${var.project_id}-gke`	N/A	N/A	N/A	N/A	N/A	N/A	1	Cluster Management
System Services Node Pool	`${var.project_id}-sys-svc-pool`	e2-standard-2	40	No	workload=system-services	1	3	1	MinIO and Redis
Cassandra Node Pool	`${var.project_id}-cassandra-pool`	e2-highmem-4	40	No	workload=cassandra	1	2	1	Cassandra Database
Backend Doc Node Pool	`${var.project_id}-doc-pool`	e2-standard-4	40	No	workload=backend-doc	1	2	1	Backend Doc Management
Backend Chat Node Pool	`${var.project_id}-chat-pool`	e2-standard-4	40	No	workload=backend-chat	1	2	1	Backend Chat Service
Frontend and NGINX Node Pool	`${var.project_id}-fe-pool`	e2-medium	40	Yes	workload=frontend	1	1	1	Frontend & NGINX

This completes the setup and deployment of the chatbot application on GKE. 🎉 With NodePort Service, you can access the frontend from the external IP of a node.

Figure 5. Access frontend via Node's External IP.

Observability

So far, we have deployed the model and the FastAPI app to GKE. Now, we need to monitor the performance of the model and the app. We will use Prometheus and Grafana for monitoring the model and the app, Jaeger for tracing the requests, and Elasticsearch and Kibana for collecting system logs. Let's get started!

Increase inotify watch limits for Kubernetes instances.

cd $ROOT_DIR/observability/inotify
kubectl apply -f inotify-limits.yaml

Create a separate namespace for observability and switch to it.
```
kubectl create namespace observability
kubens observability
```

Install Jaeger for tracing the application.

cd $ROOT_DIR/k8s-yaml
kubectl apply -f jaeger.yaml

Install the ELK stack for collecting logs.

# Install ECK operator
cd $ROOT_DIR/observability/elasticcloud/deploy/eck-operator
kubectl delete -f https://download.elastic.co/downloads/eck/2.13.0/crds.yaml
kubectl create -f https://download.elastic.co/downloads/eck/2.13.0/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/2.13.0/operator.yaml
# Install ELK stack
cd $ROOT_DIR/observability/elasticcloud/deploy/eck-stack
kubectl get serviceaccount filebeat -n elk &> /dev/null && kubectl delete serviceaccount filebeat -n elk || true
kubectl get clusterrolebinding filebeat -n elk &> /dev/null && kubectl delete clusterrolebinding filebeat -n elk || true
kubectl get clusterrole filebeat -n elk &> /dev/null && kubectl delete clusterrole filebeat -n elk || true
helm upgrade --install elk -f values.yaml .

Install Prometheus and Grafana for monitoring the system.

cd $ROOT_DIR/observability/metric
yq e '.data."config.yml" |= sub("webhook_url: .*", "webhook_url: env(DISCORD_WEBHOOK_URL)")' -i charts/alertmanager/templates/configmap.yaml 
helm upgrade --install prom-graf . --namespace observability

Access the monitoring tools:

6.1. Jaeger:
- Forward port:
```
nohup kubectl port-forward svc/jaeger-query 16686:80 > port-forward.log 2>&1 &
```
- Access via: http://localhost:16686 or node's external-IP

Figure 6. Tracing from Jaeger.

6.2. Kibana:

Forward port: shell nohup kubectl port-forward -n observability svc/elk-eck-kibana-kb-http 5601:5601 > /dev/null 2>&1 &
Access Kibana at: http://localhost:5601

Get the password for the elastic user:

kubectl get secret elasticsearch-es-elastic-user -n observability -o jsonpath='{.data.elastic}' | base64 -d

Figure 7. Kibana Logging.

6.3. Grafana:

Forward port:

nohup kubectl port-forward -n observability svc/grafana 3000:3000 > /dev/null 2>&1 &

Access Grafana at: http://localhost:3000
Login with:
```
username: admin
password: admin
```
Check for metrics.

Figure 8. Metric from Grafana.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
app		app
iac		iac
images		images
jenkins		jenkins
k8s-yaml		k8s-yaml
observability		observability
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
init.cql		init.cql
nginx.conf		nginx.conf
port-forward.log		port-forward.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Chat With Your Documents: Stateful AI Chatbot

✨ Key Features

🌟 Demo

🌟 System Overview

📄 RAG - System

💬 Chat System

🚀 Getting Started

🚀 Application Deployment

1.1 Build Application Image

1.2 Deploy to GKE

1.2.1 Setup

1.2.2 Deployment

📄 Instances in My GKE Cluster

Observability

About

Releases

Packages

Languages

nhduong1203/LLM-Chatbot

Folders and files

Latest commit

History

Repository files navigation

🤖 Chat With Your Documents: Stateful AI Chatbot

✨ Key Features

🌟 Demo

🌟 System Overview

📄 RAG - System

💬 Chat System

🚀 Getting Started

🚀 Application Deployment

1.1 Build Application Image

1.2 Deploy to GKE

1.2.1 Setup

1.2.2 Deployment

📄 Instances in My GKE Cluster

Observability

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages