ChatSphere - Advanced College Chatbot Framework

Introduction

ChatSphere is an advanced chatbot framework developed from scratch using Python with enhanced NLP capabilities, specifically designed for college environments. Built upon our initial work with DailyDialog dataset, this final version features a custom-built institutional dataset and robust deployment pipeline.

Key capabilities:

Intent classification (Naive Bayes, Decision Tree, Random Forest)
Advanced text preprocessing with MWE recognition and contraction handling
TF-IDF feature extraction with mathematical rigor
Containerized deployment on Google Cloud with Vercel frontend
Contextual dialogue management

System Architecture

Core Components

Text Preprocessing Pipeline Multi-Word Expression (MWE) Recognition using WordNet Comprehensive contraction handling Advanced hyphen/punctuation management Case normalization and token splitting

Feature Extraction

Custom TF-IDF Vectorizer with smoothing:

TF-IDF(t,d) = (Count(t,d)/∑Count(t',d)) × [log((N+1)/(DF(t)+1)) + 1]

Vocabulary construction from tokenized corpus

Intent Classifiers

Multinomial Naive Bayes (90.48% accuracy)
- Laplace smoothing implementation
```
P(t|c) = (Count(t,c)+1)/(∑Count(t',c)+|V|)
```
Decision Tree (89.08% accuracy)
- Gini Index and Entropy based splitting
Random Forest (91.11% accuracy - best performing)
- Ensemble of decision trees with mode aggregation

Deployment Architecture

Containerized using Docker
Flask backend on Google Cloud
Express/Node.js frontend on Vercel
Vertex AI with L4 GPU for training

Data Files

Place these files in your working directory:
intents.json - Custom college dataset with tags, patterns and responses
(Optional) DailyDialog files for baseline comparison

3. Initialize the Chatbot

from chatbot import ChatSphere

Create chatbot instance (default: Random Forest)

chatbot = ChatSphere(bot_name="CollegeBot")

Train with custom college data

chatbot.train(data_file="intents.json")

Deployment Guide

Containerization

Sample Dockerfile

FROM python:3.9
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "app.py"]

Google Cloud Deployment Build Docker image: docker build -t chatsphere .

Push to Google Container Registry

Deploy to Cloud Run with automatic scaling
Frontend on Vercel Configure frontend with backend API key

Deploy using Vercel CLI: vercel --prod
Training Infrastructure Use College_Chatbot.ipynb on Vertex AI (Colab Enterprise)

L4 GPU recommended for training

Entity Definitions

Academic Entities

chatbot.add_entity_pattern( "course_code", r"\b[A-Z]{2,4}\s?\d{3}\b" # e.g., "CS 101" )
Location Entities chatbot.add_entity_pattern( "campus_location", r"\b(?:library|lab complex|academic block)\b" )
Administrative Terms

chatbot.add_entity_pattern( "admin_term", r"\b(?:registrar|time table|exam schedule)\b" )

Training Options

With Custom College Data chatbot.train( data_file="intents.json", test_size=0.2, vectorizer_params={"max_features": 1500} )
Performance Tuning

Random Forest hyperparameters

chatbot.set_model_params(
    n_estimators=200,
    max_depth=10,
    min_samples_split=5
)

Interactive Chat

Example Session

You: Where is the CS101 class?
ChatSphere: CS101 is held in Academic Block B, Room 203.

You: When is the exam registration deadline?
ChatSphere: The deadline for exam registration is November 15th.
I detected: date: November 15th

Reset Conversation

chatbot.reset()  # Clears dialogue history

Customization

Add New Intents Edit intents.json to include: { "tag": "new_intent", "patterns": ["sample queries"], "responses": ["appropriate responses"] }
Modify Classifier Options: 'naive_bayes', 'decision_tree', 'random_forest' chatbot = ChatSphere(model_type="random_forest")

Troubleshooting

WordNet Issues import wn wn.download("omw-en")
Deployment Errors Verify API keys in Google Cloud and Vercel
Check container logs: gcloud logging read
Performance Issues Increase TF-IDF max_features
Adjust classifier hyperparameters
Expand training data in intents.json

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
client		client
dataset		dataset
images		images
server		server
.gitignore		.gitignore
College_Chatbot.ipynb		College_Chatbot.ipynb
DockerFile		DockerFile
LICENSE		LICENSE
PRML_Project_End_Term_Report.pdf		PRML_Project_End_Term_Report.pdf
README.md		README.md
Random_forest.joblib		Random_forest.joblib
__init__.py		__init__.py
deploy.py		deploy.py
index.html		index.html
models.py		models.py
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
script.js		script.js
style.css		style.css
train.py		train.py
vectorizer.joblib		vectorizer.joblib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatSphere - Advanced College Chatbot Framework

Introduction

System Architecture

Core Components

Data Files

3. Initialize the Chatbot

Create chatbot instance (default: Random Forest)

Train with custom college data

Deployment Guide

Sample Dockerfile

Entity Definitions

Training Options

Random Forest hyperparameters

Interactive Chat

Customization

Troubleshooting

Future Work

Contributors

Project Links

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ChatSphere - Advanced College Chatbot Framework

Introduction

System Architecture

Core Components

Data Files

3. Initialize the Chatbot

Create chatbot instance (default: Random Forest)

Train with custom college data

Deployment Guide

Sample Dockerfile

Entity Definitions

Training Options

Random Forest hyperparameters

Interactive Chat

Customization

Troubleshooting

Future Work

Contributors

Project Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages