MLOps Car Price Predictor 🚗

📋 Project Overview

A complete MLOps project for predicting used car prices using machine learning, featuring MLflow experiment tracking, Databricks model serving, Docker containerization, Kubernetes orchestration, and a modern React frontend with CI/CD automation.

🚀 Features

✅ ML Pipeline: 4 models (Linear Regression, Random Forest, XGBoost, LightGBM) with MLflow tracking
✅ Model Serving: Databricks Model Serving with Unity Catalog integration
✅ Frontend: Beautiful React UI with Tailwind CSS and real-time predictions
✅ Proxy Server: Node.js Express proxy to handle CORS and API forwarding
✅ Docker: Multi-container deployment with Docker Compose
✅ Kubernetes: Production-ready K8s manifests with health checks
✅ CI/CD: GitHub Actions pipeline for automated testing and deployment

🎯 Assessment Requirements Met

✅ MLflow Integration - Complete experiment tracking with 4 models
✅ Cloud Deployment - Databricks Model Serving endpoint deployed
✅ Containerization - Docker + Kubernetes manifests created
✅ CI/CD Pipeline - GitHub Actions workflows configured

📁 Project Structure

MLops/
├── data/                          # Dataset storage
│   ├── raw/                       # Raw data from Kaggle
│   └── processed/                 # Processed data
├── notebooks/                     # Jupyter notebooks for exploration
│   └── exploratory_analysis.ipynb
├── src/                           # Source code
│   ├── data/                      # Data processing
│   │   ├── __init__.py
│   │   └── data_loader.py
│   ├── models/                    # Model definitions
│   │   ├── __init__.py
│   │   └── model.py
│   ├── training/                  # Training scripts
│   │   ├── __init__.py
│   │   └── train.py
│   ├── inference/                 # Inference/serving
│   │   ├── __init__.py
│   │   └── predict.py
│   └── utils/                     # Utility functions
│       ├── __init__.py
│       └── config.py
├── mlflow/                        # MLflow configurations
│   ├── mlflow_tracking.py         # MLflow tracking examples
│   ├── model_registry.py          # Model registration
│   └── model_versioning.py        # Version management
├── docker/                        # Docker configurations
│   ├── Dockerfile
│   ├── docker-compose.yml
│   └── .dockerignore
├── kubernetes/                    # Kubernetes manifests
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── configmap.yaml
│   └── ingress.yaml
├── .github/                       # GitHub Actions
│   └── workflows/
│       ├── ci.yml                 # CI pipeline
│       └── cd.yml                 # CD pipeline
├── tests/                         # Unit and integration tests
│   ├── __init__.py
│   ├── test_model.py
│   └── test_api.py
├── scripts/                       # Deployment scripts
│   ├── deploy_to_cloud.sh
│   └── setup_environment.sh
├── requirements.txt               # Python dependencies
├── setup.py                       # Package setup
├── .env.example                   # Environment variables template
├── .gitignore
└── README.md

🚀 Getting Started

Prerequisites

Python 3.8+
Docker Desktop
Kubernetes (minikube or cloud provider)
Databricks account
GitHub account
Cloud provider account (Azure/AWS/GCP)

Installation

Clone the repository
```
git clone <your-repo-url>
cd MLops
```

Create virtual environment

python -m venv venv
venv\Scripts\activate  # Windows
# source venv/bin/activate  # Linux/Mac

Install dependencies
```
pip install -r requirements.txt
```

Configure environment variables

copy .env.example .env
# Edit .env with your credentials

📊 Dataset

Problem Statement: Used Car Price Prediction

Dataset Source: Vehicle Dataset from CarDekho (Kaggle)

Description: Predict the selling price of used cars based on various features such as year, kilometers driven, fuel type, transmission, ownership history, and technical specifications. This dataset contains approximately 8,128 records with 13 features.

Features:

name: Car model name
year: Manufacturing year
selling_price: Target variable (price in INR)
km_driven: Total kilometers driven
fuel: Fuel type (Petrol, Diesel, CNG, LPG, Electric)
seller_type: Individual, Dealer, Trustmark Dealer
transmission: Manual or Automatic
owner: First Owner, Second Owner, etc.
mileage: Fuel efficiency (km/l or km/kg)
engine: Engine capacity (CC)
max_power: Maximum power (bhp)
seats: Number of seats

Engineered Features:

car_age: Current year - manufacturing year
km_per_year: Average kilometers driven per year
power_to_engine_ratio: Power to engine size ratio

🔧 MLflow Setup

1. Databricks Configuration

# Configure Databricks MLflow
import mlflow
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Users/<your-email>/mlops-experiment")

2. Tracking Parameters and Metrics

python mlflow/mlflow_tracking.py

3. Model Registration

python mlflow/model_registry.py

🐳 Docker Containerization

Build Docker Image

docker build -t mlops-model:latest -f docker/Dockerfile .

Run Container Locally

docker-compose -f docker/docker-compose.yml up

Test Container

curl -X POST http://localhost:5000/predict -H "Content-Type: application/json" -d @test_data.json

☸️ Kubernetes Deployment

Deploy to Kubernetes

kubectl apply -f kubernetes/

Check Deployment Status

kubectl get pods
kubectl get services

Access Service

kubectl port-forward service/mlops-model-service 8080:80

� Docker Commands Reference

Building Images

# Build proxy server image
docker build -t car-price-proxy:latest -f Dockerfile.proxy .

# Build frontend image
cd frontend
docker build -t car-price-frontend:latest .
cd ..

# Build with no cache (clean build)
docker build --no-cache -t car-price-proxy:latest -f Dockerfile.proxy .

Docker Compose Commands

# Start all services in detached mode
docker-compose up -d

# Build and start services
docker-compose up -d --build

# Stop all services
docker-compose down

# Stop and remove volumes
docker-compose down -v

# View running containers
docker-compose ps

# View logs (all services)
docker-compose logs

# View logs (specific service)
docker-compose logs proxy
docker-compose logs frontend

# Follow logs in real-time
docker-compose logs -f

# Restart services
docker-compose restart

# Restart specific service
docker-compose restart proxy

# Scale services
docker-compose up -d --scale proxy=3

Container Management

# List running containers
docker ps

# List all containers (including stopped)
docker ps -a

# Stop a container
docker stop <container-id>

# Remove a container
docker rm <container-id>

# Remove all stopped containers
docker container prune

# View container logs
docker logs <container-id>

# Follow container logs
docker logs -f <container-id>

# Execute command in running container
docker exec -it <container-id> /bin/sh

# Inspect container
docker inspect <container-id>

# View container stats (CPU, Memory)
docker stats

Image Management

# List images
docker images

# Remove an image
docker rmi <image-id>

# Remove unused images
docker image prune

# Remove all unused images
docker image prune -a

# Tag an image
docker tag mlops-frontend:latest myregistry/mlops-frontend:v1.0

# Push to registry
docker push myregistry/mlops-frontend:v1.0

# Pull from registry
docker pull myregistry/mlops-frontend:v1.0

Troubleshooting

# Check Docker version
docker --version

# View Docker system info
docker info

# Clean up everything (careful!)
docker system prune -a

# Check disk usage
docker system df

# Test if containers can reach each other
docker exec <container-id> ping <other-container-name>

☸️ Kubernetes Commands Reference

Cluster Management

# Check cluster info
kubectl cluster-info

# View cluster nodes
kubectl get nodes

# Describe a node
kubectl describe node <node-name>

# View cluster events
kubectl get events --sort-by='.lastTimestamp'

Deployment Commands

# Apply all manifests
kubectl apply -f k8s/

# Apply specific manifest
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/proxy-deployment.yaml
kubectl apply -f k8s/frontend-deployment.yaml

# View all resources
kubectl get all

# View deployments
kubectl get deployments

# Describe deployment
kubectl describe deployment car-price-frontend

# Edit deployment (opens editor)
kubectl edit deployment car-price-frontend

# Delete deployment
kubectl delete deployment car-price-frontend

# Delete all resources from manifests
kubectl delete -f k8s/

Pod Management

# List all pods
kubectl get pods

# List pods with more details
kubectl get pods -o wide

# Describe a pod
kubectl describe pod <pod-name>

# View pod logs
kubectl logs <pod-name>

# Follow pod logs
kubectl logs -f <pod-name>

# View logs from all pods with label
kubectl logs -l app=car-price-proxy

# View logs from previous container instance
kubectl logs <pod-name> --previous

# Execute command in pod
kubectl exec -it <pod-name> -- /bin/sh

# Copy files to/from pod
kubectl cp <pod-name>:/path/to/file ./local-file
kubectl cp ./local-file <pod-name>:/path/to/file

# Delete pod (will be recreated by deployment)
kubectl delete pod <pod-name>

Service Management

# List services
kubectl get services
kubectl get svc

# Describe service
kubectl describe service car-price-frontend-service

# Port forward to service
kubectl port-forward service/car-price-frontend-service 8080:80

# Port forward to pod
kubectl port-forward <pod-name> 8080:80

# Get service endpoints
kubectl get endpoints

Scaling

# Scale deployment
kubectl scale deployment car-price-frontend --replicas=3

# Autoscale deployment
kubectl autoscale deployment car-price-frontend --min=2 --max=5 --cpu-percent=80

# View horizontal pod autoscalers
kubectl get hpa

Rolling Updates & Rollbacks

# Update image version
kubectl set image deployment/car-price-frontend frontend=mlops-frontend:v2

# View rollout status
kubectl rollout status deployment/car-price-frontend

# View rollout history
kubectl rollout history deployment/car-price-frontend

# Rollback to previous version
kubectl rollout undo deployment/car-price-frontend

# Rollback to specific revision
kubectl rollout undo deployment/car-price-frontend --to-revision=2

# Pause rollout
kubectl rollout pause deployment/car-price-frontend

# Resume rollout
kubectl rollout resume deployment/car-price-frontend

Secrets & ConfigMaps

# List secrets
kubectl get secrets

# Describe secret
kubectl describe secret databricks-credentials

# Create secret from literal
kubectl create secret generic my-secret --from-literal=key1=value1

# Create secret from file
kubectl create secret generic my-secret --from-file=./secret.txt

# Delete secret
kubectl delete secret databricks-credentials

# List configmaps
kubectl get configmaps

# Create configmap
kubectl create configmap my-config --from-literal=key1=value1

Monitoring & Debugging

# View resource usage
kubectl top nodes
kubectl top pods

# Get pod events
kubectl get events --field-selector involvedObject.name=<pod-name>

# Check pod status
kubectl get pods --watch

# Debug pod that won't start
kubectl describe pod <pod-name>
kubectl logs <pod-name>

# Interactive debugging
kubectl run -it --rm debug --image=busybox --restart=Never -- sh

# Network debugging
kubectl run -it --rm debug --image=nicolaka/netshoot --restart=Never -- bash

Namespace Management

# List namespaces
kubectl get namespaces

# Create namespace
kubectl create namespace mlops-production

# Set default namespace
kubectl config set-context --current --namespace=mlops-production

# Delete namespace
kubectl delete namespace mlops-production

Useful Shortcuts

# Get resources with custom output
kubectl get pods -o json
kubectl get pods -o yaml
kubectl get pods -o wide

# Get specific fields
kubectl get pods -o=jsonpath='{.items[0].metadata.name}'

# Label resources
kubectl label pods <pod-name> environment=production

# Annotate resources
kubectl annotate pods <pod-name> description="Main proxy server"

# Dry run (test without applying)
kubectl apply -f k8s/frontend-deployment.yaml --dry-run=client

# Explain resource fields
kubectl explain pods
kubectl explain deployment.spec

Quick Troubleshooting

# Pod stuck in Pending
kubectl describe pod <pod-name>  # Check events section

# Pod stuck in ImagePullBackOff
kubectl describe pod <pod-name>  # Check image name and pull policy

# Pod CrashLoopBackOff
kubectl logs <pod-name>  # Check application logs
kubectl logs <pod-name> --previous  # Check previous container logs

# Service not accessible
kubectl get endpoints <service-name>  # Verify endpoints exist
kubectl describe service <service-name>  # Check selector labels

# Cannot connect to pods
kubectl exec -it <pod-name> -- ping <other-pod-ip>  # Test connectivity

�🔄 CI/CD Pipeline

The project uses GitHub Actions for automated CI/CD:

Continuous Integration (CI)

Code linting and formatting
Unit tests
Model training and validation
Docker image building

Continuous Deployment (CD)

Push Docker image to registry
Deploy to Kubernetes
Model registration in MLflow
Automated testing in staging

🌐 Cloud Deployment

Azure Deployment

# Azure Container Instances
bash scripts/deploy_to_azure.sh

AWS Deployment

# AWS EKS
bash scripts/deploy_to_aws.sh

📈 Monitoring

MLflow UI: Track experiments, compare runs
Databricks Dashboard: Monitor model performance
Kubernetes Dashboard: Monitor container health
Cloud Monitoring: Native cloud monitoring tools

🧪 Testing

# Run all tests
pytest tests/

# Run specific test
pytest tests/test_model.py

# Generate coverage report
pytest --cov=src tests/

📝 Key Features

MLflow

✅ Experiment tracking
✅ Parameter and metric logging
✅ Artifact storage
✅ Model registry
✅ Model versioning
✅ Model staging (Staging/Production)

Containerization

✅ Docker multi-stage builds
✅ Optimized image size
✅ Health checks
✅ Environment configuration

Kubernetes

✅ Auto-scaling
✅ Load balancing
✅ Rolling updates
✅ Resource management

CI/CD

✅ Automated testing
✅ Continuous training
✅ Automated deployment
✅ Version control integration

🔐 Security Considerations

Secrets management using environment variables
Docker image scanning
RBAC in Kubernetes
API authentication and authorization

📚 Additional Resources

👥 Team Members

Member 1: [Name]
Member 2: [Name]
Member 3: [Name]

📄 License

This project is for educational purposes as part of an MLOps assessment.

🎯 Quick Reference - Common Tasks

Start Everything Locally

# Using Docker Compose (recommended for development)
docker-compose up -d

# Access at: http://localhost:80

Deploy to Kubernetes

# Apply all manifests
kubectl apply -f k8s/

# Check status
kubectl get pods
kubectl get services

# Access via port-forward
kubectl port-forward service/car-price-frontend-service 8080:80
# Access at: http://localhost:8080

View Logs

# Docker Compose
docker-compose logs -f

# Kubernetes
kubectl logs -l app=car-price-frontend -f
kubectl logs -l app=car-price-proxy -f

Scale Application

# Docker Compose
docker-compose up -d --scale proxy=3

# Kubernetes
kubectl scale deployment car-price-frontend --replicas=5

Update Application

# Docker Compose (rebuild and restart)
docker-compose up -d --build

# Kubernetes (rolling update)
kubectl set image deployment/car-price-frontend frontend=mlops-frontend:v2
kubectl rollout status deployment/car-price-frontend

Clean Up

# Docker Compose (stop and remove)
docker-compose down

# Kubernetes (delete all resources)
kubectl delete -f k8s/

# Complete cleanup
docker-compose down -v  # Remove volumes too
docker system prune -a  # Remove all unused Docker resources

🙏 Acknowledgments

Dataset from Kaggle
Databricks for MLflow platform
Open-source community

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
data		data
docker		docker
frontend		frontend
k8s		k8s
kubernetes		kubernetes
notebooks		notebooks
notes		notes
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

MLOps Car Price Predictor 🚗

📋 Project Overview

🚀 Features

🎯 Assessment Requirements Met

📁 Project Structure

🚀 Getting Started

Prerequisites

Installation

📊 Dataset

🔧 MLflow Setup

1. Databricks Configuration

2. Tracking Parameters and Metrics

3. Model Registration

🐳 Docker Containerization

Build Docker Image

Run Container Locally

Test Container

☸️ Kubernetes Deployment

Deploy to Kubernetes

Check Deployment Status

Access Service

� Docker Commands Reference

Building Images

Docker Compose Commands

Container Management

Image Management

Troubleshooting

☸️ Kubernetes Commands Reference

Cluster Management

Deployment Commands

Pod Management

Service Management

Scaling

Rolling Updates & Rollbacks

Secrets & ConfigMaps

Monitoring & Debugging

Namespace Management

Useful Shortcuts

Quick Troubleshooting

�🔄 CI/CD Pipeline

Continuous Integration (CI)

Continuous Deployment (CD)

🌐 Cloud Deployment

Azure Deployment

AWS Deployment

📈 Monitoring

🧪 Testing

📝 Key Features

MLflow

Containerization

Kubernetes

CI/CD

🔐 Security Considerations

📚 Additional Resources

👥 Team Members

📄 License

🎯 Quick Reference - Common Tasks

Start Everything Locally

Deploy to Kubernetes

View Logs

Scale Application

Update Application

Clean Up

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages