A complete MLOps project for predicting used car prices using machine learning, featuring MLflow experiment tracking, Databricks model serving, Docker containerization, Kubernetes orchestration, and a modern React frontend with CI/CD automation.
- β ML Pipeline: 4 models (Linear Regression, Random Forest, XGBoost, LightGBM) with MLflow tracking
- β Model Serving: Databricks Model Serving with Unity Catalog integration
- β Frontend: Beautiful React UI with Tailwind CSS and real-time predictions
- β Proxy Server: Node.js Express proxy to handle CORS and API forwarding
- β Docker: Multi-container deployment with Docker Compose
- β Kubernetes: Production-ready K8s manifests with health checks
- β CI/CD: GitHub Actions pipeline for automated testing and deployment
- β MLflow Integration - Complete experiment tracking with 4 models
- β Cloud Deployment - Databricks Model Serving endpoint deployed
- β Containerization - Docker + Kubernetes manifests created
- β CI/CD Pipeline - GitHub Actions workflows configured
MLops/
βββ data/ # Dataset storage
β βββ raw/ # Raw data from Kaggle
β βββ processed/ # Processed data
βββ notebooks/ # Jupyter notebooks for exploration
β βββ exploratory_analysis.ipynb
βββ src/ # Source code
β βββ data/ # Data processing
β β βββ __init__.py
β β βββ data_loader.py
β βββ models/ # Model definitions
β β βββ __init__.py
β β βββ model.py
β βββ training/ # Training scripts
β β βββ __init__.py
β β βββ train.py
β βββ inference/ # Inference/serving
β β βββ __init__.py
β β βββ predict.py
β βββ utils/ # Utility functions
β βββ __init__.py
β βββ config.py
βββ mlflow/ # MLflow configurations
β βββ mlflow_tracking.py # MLflow tracking examples
β βββ model_registry.py # Model registration
β βββ model_versioning.py # Version management
βββ docker/ # Docker configurations
β βββ Dockerfile
β βββ docker-compose.yml
β βββ .dockerignore
βββ kubernetes/ # Kubernetes manifests
β βββ deployment.yaml
β βββ service.yaml
β βββ configmap.yaml
β βββ ingress.yaml
βββ .github/ # GitHub Actions
β βββ workflows/
β βββ ci.yml # CI pipeline
β βββ cd.yml # CD pipeline
βββ tests/ # Unit and integration tests
β βββ __init__.py
β βββ test_model.py
β βββ test_api.py
βββ scripts/ # Deployment scripts
β βββ deploy_to_cloud.sh
β βββ setup_environment.sh
βββ requirements.txt # Python dependencies
βββ setup.py # Package setup
βββ .env.example # Environment variables template
βββ .gitignore
βββ README.md
- Python 3.8+
- Docker Desktop
- Kubernetes (minikube or cloud provider)
- Databricks account
- GitHub account
- Cloud provider account (Azure/AWS/GCP)
-
Clone the repository
git clone <your-repo-url> cd MLops
-
Create virtual environment
python -m venv venv venv\Scripts\activate # Windows # source venv/bin/activate # Linux/Mac
-
Install dependencies
pip install -r requirements.txt
-
Configure environment variables
copy .env.example .env # Edit .env with your credentials
Problem Statement: Used Car Price Prediction
Dataset Source: Vehicle Dataset from CarDekho (Kaggle)
Description: Predict the selling price of used cars based on various features such as year, kilometers driven, fuel type, transmission, ownership history, and technical specifications. This dataset contains approximately 8,128 records with 13 features.
Features:
name: Car model nameyear: Manufacturing yearselling_price: Target variable (price in INR)km_driven: Total kilometers drivenfuel: Fuel type (Petrol, Diesel, CNG, LPG, Electric)seller_type: Individual, Dealer, Trustmark Dealertransmission: Manual or Automaticowner: First Owner, Second Owner, etc.mileage: Fuel efficiency (km/l or km/kg)engine: Engine capacity (CC)max_power: Maximum power (bhp)seats: Number of seats
Engineered Features:
car_age: Current year - manufacturing yearkm_per_year: Average kilometers driven per yearpower_to_engine_ratio: Power to engine size ratio
# Configure Databricks MLflow
import mlflow
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Users/<your-email>/mlops-experiment")python mlflow/mlflow_tracking.pypython mlflow/model_registry.pydocker build -t mlops-model:latest -f docker/Dockerfile .docker-compose -f docker/docker-compose.yml upcurl -X POST http://localhost:5000/predict -H "Content-Type: application/json" -d @test_data.jsonkubectl apply -f kubernetes/kubectl get pods
kubectl get serviceskubectl port-forward service/mlops-model-service 8080:80# Build proxy server image
docker build -t car-price-proxy:latest -f Dockerfile.proxy .
# Build frontend image
cd frontend
docker build -t car-price-frontend:latest .
cd ..
# Build with no cache (clean build)
docker build --no-cache -t car-price-proxy:latest -f Dockerfile.proxy .# Start all services in detached mode
docker-compose up -d
# Build and start services
docker-compose up -d --build
# Stop all services
docker-compose down
# Stop and remove volumes
docker-compose down -v
# View running containers
docker-compose ps
# View logs (all services)
docker-compose logs
# View logs (specific service)
docker-compose logs proxy
docker-compose logs frontend
# Follow logs in real-time
docker-compose logs -f
# Restart services
docker-compose restart
# Restart specific service
docker-compose restart proxy
# Scale services
docker-compose up -d --scale proxy=3# List running containers
docker ps
# List all containers (including stopped)
docker ps -a
# Stop a container
docker stop <container-id>
# Remove a container
docker rm <container-id>
# Remove all stopped containers
docker container prune
# View container logs
docker logs <container-id>
# Follow container logs
docker logs -f <container-id>
# Execute command in running container
docker exec -it <container-id> /bin/sh
# Inspect container
docker inspect <container-id>
# View container stats (CPU, Memory)
docker stats# List images
docker images
# Remove an image
docker rmi <image-id>
# Remove unused images
docker image prune
# Remove all unused images
docker image prune -a
# Tag an image
docker tag mlops-frontend:latest myregistry/mlops-frontend:v1.0
# Push to registry
docker push myregistry/mlops-frontend:v1.0
# Pull from registry
docker pull myregistry/mlops-frontend:v1.0# Check Docker version
docker --version
# View Docker system info
docker info
# Clean up everything (careful!)
docker system prune -a
# Check disk usage
docker system df
# Test if containers can reach each other
docker exec <container-id> ping <other-container-name># Check cluster info
kubectl cluster-info
# View cluster nodes
kubectl get nodes
# Describe a node
kubectl describe node <node-name>
# View cluster events
kubectl get events --sort-by='.lastTimestamp'# Apply all manifests
kubectl apply -f k8s/
# Apply specific manifest
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/proxy-deployment.yaml
kubectl apply -f k8s/frontend-deployment.yaml
# View all resources
kubectl get all
# View deployments
kubectl get deployments
# Describe deployment
kubectl describe deployment car-price-frontend
# Edit deployment (opens editor)
kubectl edit deployment car-price-frontend
# Delete deployment
kubectl delete deployment car-price-frontend
# Delete all resources from manifests
kubectl delete -f k8s/# List all pods
kubectl get pods
# List pods with more details
kubectl get pods -o wide
# Describe a pod
kubectl describe pod <pod-name>
# View pod logs
kubectl logs <pod-name>
# Follow pod logs
kubectl logs -f <pod-name>
# View logs from all pods with label
kubectl logs -l app=car-price-proxy
# View logs from previous container instance
kubectl logs <pod-name> --previous
# Execute command in pod
kubectl exec -it <pod-name> -- /bin/sh
# Copy files to/from pod
kubectl cp <pod-name>:/path/to/file ./local-file
kubectl cp ./local-file <pod-name>:/path/to/file
# Delete pod (will be recreated by deployment)
kubectl delete pod <pod-name># List services
kubectl get services
kubectl get svc
# Describe service
kubectl describe service car-price-frontend-service
# Port forward to service
kubectl port-forward service/car-price-frontend-service 8080:80
# Port forward to pod
kubectl port-forward <pod-name> 8080:80
# Get service endpoints
kubectl get endpoints# Scale deployment
kubectl scale deployment car-price-frontend --replicas=3
# Autoscale deployment
kubectl autoscale deployment car-price-frontend --min=2 --max=5 --cpu-percent=80
# View horizontal pod autoscalers
kubectl get hpa# Update image version
kubectl set image deployment/car-price-frontend frontend=mlops-frontend:v2
# View rollout status
kubectl rollout status deployment/car-price-frontend
# View rollout history
kubectl rollout history deployment/car-price-frontend
# Rollback to previous version
kubectl rollout undo deployment/car-price-frontend
# Rollback to specific revision
kubectl rollout undo deployment/car-price-frontend --to-revision=2
# Pause rollout
kubectl rollout pause deployment/car-price-frontend
# Resume rollout
kubectl rollout resume deployment/car-price-frontend# List secrets
kubectl get secrets
# Describe secret
kubectl describe secret databricks-credentials
# Create secret from literal
kubectl create secret generic my-secret --from-literal=key1=value1
# Create secret from file
kubectl create secret generic my-secret --from-file=./secret.txt
# Delete secret
kubectl delete secret databricks-credentials
# List configmaps
kubectl get configmaps
# Create configmap
kubectl create configmap my-config --from-literal=key1=value1# View resource usage
kubectl top nodes
kubectl top pods
# Get pod events
kubectl get events --field-selector involvedObject.name=<pod-name>
# Check pod status
kubectl get pods --watch
# Debug pod that won't start
kubectl describe pod <pod-name>
kubectl logs <pod-name>
# Interactive debugging
kubectl run -it --rm debug --image=busybox --restart=Never -- sh
# Network debugging
kubectl run -it --rm debug --image=nicolaka/netshoot --restart=Never -- bash# List namespaces
kubectl get namespaces
# Create namespace
kubectl create namespace mlops-production
# Set default namespace
kubectl config set-context --current --namespace=mlops-production
# Delete namespace
kubectl delete namespace mlops-production# Get resources with custom output
kubectl get pods -o json
kubectl get pods -o yaml
kubectl get pods -o wide
# Get specific fields
kubectl get pods -o=jsonpath='{.items[0].metadata.name}'
# Label resources
kubectl label pods <pod-name> environment=production
# Annotate resources
kubectl annotate pods <pod-name> description="Main proxy server"
# Dry run (test without applying)
kubectl apply -f k8s/frontend-deployment.yaml --dry-run=client
# Explain resource fields
kubectl explain pods
kubectl explain deployment.spec# Pod stuck in Pending
kubectl describe pod <pod-name> # Check events section
# Pod stuck in ImagePullBackOff
kubectl describe pod <pod-name> # Check image name and pull policy
# Pod CrashLoopBackOff
kubectl logs <pod-name> # Check application logs
kubectl logs <pod-name> --previous # Check previous container logs
# Service not accessible
kubectl get endpoints <service-name> # Verify endpoints exist
kubectl describe service <service-name> # Check selector labels
# Cannot connect to pods
kubectl exec -it <pod-name> -- ping <other-pod-ip> # Test connectivityThe project uses GitHub Actions for automated CI/CD:
- Code linting and formatting
- Unit tests
- Model training and validation
- Docker image building
- Push Docker image to registry
- Deploy to Kubernetes
- Model registration in MLflow
- Automated testing in staging
# Azure Container Instances
bash scripts/deploy_to_azure.sh# AWS EKS
bash scripts/deploy_to_aws.sh- MLflow UI: Track experiments, compare runs
- Databricks Dashboard: Monitor model performance
- Kubernetes Dashboard: Monitor container health
- Cloud Monitoring: Native cloud monitoring tools
# Run all tests
pytest tests/
# Run specific test
pytest tests/test_model.py
# Generate coverage report
pytest --cov=src tests/- β Experiment tracking
- β Parameter and metric logging
- β Artifact storage
- β Model registry
- β Model versioning
- β Model staging (Staging/Production)
- β Docker multi-stage builds
- β Optimized image size
- β Health checks
- β Environment configuration
- β Auto-scaling
- β Load balancing
- β Rolling updates
- β Resource management
- β Automated testing
- β Continuous training
- β Automated deployment
- β Version control integration
- Secrets management using environment variables
- Docker image scanning
- RBAC in Kubernetes
- API authentication and authorization
- Member 1: [Name]
- Member 2: [Name]
- Member 3: [Name]
This project is for educational purposes as part of an MLOps assessment.
# Using Docker Compose (recommended for development)
docker-compose up -d
# Access at: http://localhost:80# Apply all manifests
kubectl apply -f k8s/
# Check status
kubectl get pods
kubectl get services
# Access via port-forward
kubectl port-forward service/car-price-frontend-service 8080:80
# Access at: http://localhost:8080# Docker Compose
docker-compose logs -f
# Kubernetes
kubectl logs -l app=car-price-frontend -f
kubectl logs -l app=car-price-proxy -f# Docker Compose
docker-compose up -d --scale proxy=3
# Kubernetes
kubectl scale deployment car-price-frontend --replicas=5# Docker Compose (rebuild and restart)
docker-compose up -d --build
# Kubernetes (rolling update)
kubectl set image deployment/car-price-frontend frontend=mlops-frontend:v2
kubectl rollout status deployment/car-price-frontend# Docker Compose (stop and remove)
docker-compose down
# Kubernetes (delete all resources)
kubectl delete -f k8s/
# Complete cleanup
docker-compose down -v # Remove volumes too
docker system prune -a # Remove all unused Docker resources- Dataset from Kaggle
- Databricks for MLflow platform
- Open-source community