Skip to content

chaitanyamedidar/ML-Ops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MLOps Car Price Predictor πŸš—

πŸ“‹ Project Overview

A complete MLOps project for predicting used car prices using machine learning, featuring MLflow experiment tracking, Databricks model serving, Docker containerization, Kubernetes orchestration, and a modern React frontend with CI/CD automation.

πŸš€ Features

  • βœ… ML Pipeline: 4 models (Linear Regression, Random Forest, XGBoost, LightGBM) with MLflow tracking
  • βœ… Model Serving: Databricks Model Serving with Unity Catalog integration
  • βœ… Frontend: Beautiful React UI with Tailwind CSS and real-time predictions
  • βœ… Proxy Server: Node.js Express proxy to handle CORS and API forwarding
  • βœ… Docker: Multi-container deployment with Docker Compose
  • βœ… Kubernetes: Production-ready K8s manifests with health checks
  • βœ… CI/CD: GitHub Actions pipeline for automated testing and deployment

🎯 Assessment Requirements Met

  1. βœ… MLflow Integration - Complete experiment tracking with 4 models
  2. βœ… Cloud Deployment - Databricks Model Serving endpoint deployed
  3. βœ… Containerization - Docker + Kubernetes manifests created
  4. βœ… CI/CD Pipeline - GitHub Actions workflows configured

πŸ“ Project Structure

MLops/
β”œβ”€β”€ data/                          # Dataset storage
β”‚   β”œβ”€β”€ raw/                       # Raw data from Kaggle
β”‚   └── processed/                 # Processed data
β”œβ”€β”€ notebooks/                     # Jupyter notebooks for exploration
β”‚   └── exploratory_analysis.ipynb
β”œβ”€β”€ src/                           # Source code
β”‚   β”œβ”€β”€ data/                      # Data processing
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── data_loader.py
β”‚   β”œβ”€β”€ models/                    # Model definitions
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── model.py
β”‚   β”œβ”€β”€ training/                  # Training scripts
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── train.py
β”‚   β”œβ”€β”€ inference/                 # Inference/serving
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── predict.py
β”‚   └── utils/                     # Utility functions
β”‚       β”œβ”€β”€ __init__.py
β”‚       └── config.py
β”œβ”€β”€ mlflow/                        # MLflow configurations
β”‚   β”œβ”€β”€ mlflow_tracking.py         # MLflow tracking examples
β”‚   β”œβ”€β”€ model_registry.py          # Model registration
β”‚   └── model_versioning.py        # Version management
β”œβ”€β”€ docker/                        # Docker configurations
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ docker-compose.yml
β”‚   └── .dockerignore
β”œβ”€β”€ kubernetes/                    # Kubernetes manifests
β”‚   β”œβ”€β”€ deployment.yaml
β”‚   β”œβ”€β”€ service.yaml
β”‚   β”œβ”€β”€ configmap.yaml
β”‚   └── ingress.yaml
β”œβ”€β”€ .github/                       # GitHub Actions
β”‚   └── workflows/
β”‚       β”œβ”€β”€ ci.yml                 # CI pipeline
β”‚       └── cd.yml                 # CD pipeline
β”œβ”€β”€ tests/                         # Unit and integration tests
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ test_model.py
β”‚   └── test_api.py
β”œβ”€β”€ scripts/                       # Deployment scripts
β”‚   β”œβ”€β”€ deploy_to_cloud.sh
β”‚   └── setup_environment.sh
β”œβ”€β”€ requirements.txt               # Python dependencies
β”œβ”€β”€ setup.py                       # Package setup
β”œβ”€β”€ .env.example                   # Environment variables template
β”œβ”€β”€ .gitignore
└── README.md

πŸš€ Getting Started

Prerequisites

  • Python 3.8+
  • Docker Desktop
  • Kubernetes (minikube or cloud provider)
  • Databricks account
  • GitHub account
  • Cloud provider account (Azure/AWS/GCP)

Installation

  1. Clone the repository

    git clone <your-repo-url>
    cd MLops
  2. Create virtual environment

    python -m venv venv
    venv\Scripts\activate  # Windows
    # source venv/bin/activate  # Linux/Mac
  3. Install dependencies

    pip install -r requirements.txt
  4. Configure environment variables

    copy .env.example .env
    # Edit .env with your credentials

πŸ“Š Dataset

Problem Statement: Used Car Price Prediction

Dataset Source: Vehicle Dataset from CarDekho (Kaggle)

Description: Predict the selling price of used cars based on various features such as year, kilometers driven, fuel type, transmission, ownership history, and technical specifications. This dataset contains approximately 8,128 records with 13 features.

Features:

  • name: Car model name
  • year: Manufacturing year
  • selling_price: Target variable (price in INR)
  • km_driven: Total kilometers driven
  • fuel: Fuel type (Petrol, Diesel, CNG, LPG, Electric)
  • seller_type: Individual, Dealer, Trustmark Dealer
  • transmission: Manual or Automatic
  • owner: First Owner, Second Owner, etc.
  • mileage: Fuel efficiency (km/l or km/kg)
  • engine: Engine capacity (CC)
  • max_power: Maximum power (bhp)
  • seats: Number of seats

Engineered Features:

  • car_age: Current year - manufacturing year
  • km_per_year: Average kilometers driven per year
  • power_to_engine_ratio: Power to engine size ratio

πŸ”§ MLflow Setup

1. Databricks Configuration

# Configure Databricks MLflow
import mlflow
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Users/<your-email>/mlops-experiment")

2. Tracking Parameters and Metrics

python mlflow/mlflow_tracking.py

3. Model Registration

python mlflow/model_registry.py

🐳 Docker Containerization

Build Docker Image

docker build -t mlops-model:latest -f docker/Dockerfile .

Run Container Locally

docker-compose -f docker/docker-compose.yml up

Test Container

curl -X POST http://localhost:5000/predict -H "Content-Type: application/json" -d @test_data.json

☸️ Kubernetes Deployment

Deploy to Kubernetes

kubectl apply -f kubernetes/

Check Deployment Status

kubectl get pods
kubectl get services

Access Service

kubectl port-forward service/mlops-model-service 8080:80

οΏ½ Docker Commands Reference

Building Images

# Build proxy server image
docker build -t car-price-proxy:latest -f Dockerfile.proxy .

# Build frontend image
cd frontend
docker build -t car-price-frontend:latest .
cd ..

# Build with no cache (clean build)
docker build --no-cache -t car-price-proxy:latest -f Dockerfile.proxy .

Docker Compose Commands

# Start all services in detached mode
docker-compose up -d

# Build and start services
docker-compose up -d --build

# Stop all services
docker-compose down

# Stop and remove volumes
docker-compose down -v

# View running containers
docker-compose ps

# View logs (all services)
docker-compose logs

# View logs (specific service)
docker-compose logs proxy
docker-compose logs frontend

# Follow logs in real-time
docker-compose logs -f

# Restart services
docker-compose restart

# Restart specific service
docker-compose restart proxy

# Scale services
docker-compose up -d --scale proxy=3

Container Management

# List running containers
docker ps

# List all containers (including stopped)
docker ps -a

# Stop a container
docker stop <container-id>

# Remove a container
docker rm <container-id>

# Remove all stopped containers
docker container prune

# View container logs
docker logs <container-id>

# Follow container logs
docker logs -f <container-id>

# Execute command in running container
docker exec -it <container-id> /bin/sh

# Inspect container
docker inspect <container-id>

# View container stats (CPU, Memory)
docker stats

Image Management

# List images
docker images

# Remove an image
docker rmi <image-id>

# Remove unused images
docker image prune

# Remove all unused images
docker image prune -a

# Tag an image
docker tag mlops-frontend:latest myregistry/mlops-frontend:v1.0

# Push to registry
docker push myregistry/mlops-frontend:v1.0

# Pull from registry
docker pull myregistry/mlops-frontend:v1.0

Troubleshooting

# Check Docker version
docker --version

# View Docker system info
docker info

# Clean up everything (careful!)
docker system prune -a

# Check disk usage
docker system df

# Test if containers can reach each other
docker exec <container-id> ping <other-container-name>

☸️ Kubernetes Commands Reference

Cluster Management

# Check cluster info
kubectl cluster-info

# View cluster nodes
kubectl get nodes

# Describe a node
kubectl describe node <node-name>

# View cluster events
kubectl get events --sort-by='.lastTimestamp'

Deployment Commands

# Apply all manifests
kubectl apply -f k8s/

# Apply specific manifest
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/proxy-deployment.yaml
kubectl apply -f k8s/frontend-deployment.yaml

# View all resources
kubectl get all

# View deployments
kubectl get deployments

# Describe deployment
kubectl describe deployment car-price-frontend

# Edit deployment (opens editor)
kubectl edit deployment car-price-frontend

# Delete deployment
kubectl delete deployment car-price-frontend

# Delete all resources from manifests
kubectl delete -f k8s/

Pod Management

# List all pods
kubectl get pods

# List pods with more details
kubectl get pods -o wide

# Describe a pod
kubectl describe pod <pod-name>

# View pod logs
kubectl logs <pod-name>

# Follow pod logs
kubectl logs -f <pod-name>

# View logs from all pods with label
kubectl logs -l app=car-price-proxy

# View logs from previous container instance
kubectl logs <pod-name> --previous

# Execute command in pod
kubectl exec -it <pod-name> -- /bin/sh

# Copy files to/from pod
kubectl cp <pod-name>:/path/to/file ./local-file
kubectl cp ./local-file <pod-name>:/path/to/file

# Delete pod (will be recreated by deployment)
kubectl delete pod <pod-name>

Service Management

# List services
kubectl get services
kubectl get svc

# Describe service
kubectl describe service car-price-frontend-service

# Port forward to service
kubectl port-forward service/car-price-frontend-service 8080:80

# Port forward to pod
kubectl port-forward <pod-name> 8080:80

# Get service endpoints
kubectl get endpoints

Scaling

# Scale deployment
kubectl scale deployment car-price-frontend --replicas=3

# Autoscale deployment
kubectl autoscale deployment car-price-frontend --min=2 --max=5 --cpu-percent=80

# View horizontal pod autoscalers
kubectl get hpa

Rolling Updates & Rollbacks

# Update image version
kubectl set image deployment/car-price-frontend frontend=mlops-frontend:v2

# View rollout status
kubectl rollout status deployment/car-price-frontend

# View rollout history
kubectl rollout history deployment/car-price-frontend

# Rollback to previous version
kubectl rollout undo deployment/car-price-frontend

# Rollback to specific revision
kubectl rollout undo deployment/car-price-frontend --to-revision=2

# Pause rollout
kubectl rollout pause deployment/car-price-frontend

# Resume rollout
kubectl rollout resume deployment/car-price-frontend

Secrets & ConfigMaps

# List secrets
kubectl get secrets

# Describe secret
kubectl describe secret databricks-credentials

# Create secret from literal
kubectl create secret generic my-secret --from-literal=key1=value1

# Create secret from file
kubectl create secret generic my-secret --from-file=./secret.txt

# Delete secret
kubectl delete secret databricks-credentials

# List configmaps
kubectl get configmaps

# Create configmap
kubectl create configmap my-config --from-literal=key1=value1

Monitoring & Debugging

# View resource usage
kubectl top nodes
kubectl top pods

# Get pod events
kubectl get events --field-selector involvedObject.name=<pod-name>

# Check pod status
kubectl get pods --watch

# Debug pod that won't start
kubectl describe pod <pod-name>
kubectl logs <pod-name>

# Interactive debugging
kubectl run -it --rm debug --image=busybox --restart=Never -- sh

# Network debugging
kubectl run -it --rm debug --image=nicolaka/netshoot --restart=Never -- bash

Namespace Management

# List namespaces
kubectl get namespaces

# Create namespace
kubectl create namespace mlops-production

# Set default namespace
kubectl config set-context --current --namespace=mlops-production

# Delete namespace
kubectl delete namespace mlops-production

Useful Shortcuts

# Get resources with custom output
kubectl get pods -o json
kubectl get pods -o yaml
kubectl get pods -o wide

# Get specific fields
kubectl get pods -o=jsonpath='{.items[0].metadata.name}'

# Label resources
kubectl label pods <pod-name> environment=production

# Annotate resources
kubectl annotate pods <pod-name> description="Main proxy server"

# Dry run (test without applying)
kubectl apply -f k8s/frontend-deployment.yaml --dry-run=client

# Explain resource fields
kubectl explain pods
kubectl explain deployment.spec

Quick Troubleshooting

# Pod stuck in Pending
kubectl describe pod <pod-name>  # Check events section

# Pod stuck in ImagePullBackOff
kubectl describe pod <pod-name>  # Check image name and pull policy

# Pod CrashLoopBackOff
kubectl logs <pod-name>  # Check application logs
kubectl logs <pod-name> --previous  # Check previous container logs

# Service not accessible
kubectl get endpoints <service-name>  # Verify endpoints exist
kubectl describe service <service-name>  # Check selector labels

# Cannot connect to pods
kubectl exec -it <pod-name> -- ping <other-pod-ip>  # Test connectivity

οΏ½πŸ”„ CI/CD Pipeline

The project uses GitHub Actions for automated CI/CD:

Continuous Integration (CI)

  • Code linting and formatting
  • Unit tests
  • Model training and validation
  • Docker image building

Continuous Deployment (CD)

  • Push Docker image to registry
  • Deploy to Kubernetes
  • Model registration in MLflow
  • Automated testing in staging

🌐 Cloud Deployment

Azure Deployment

# Azure Container Instances
bash scripts/deploy_to_azure.sh

AWS Deployment

# AWS EKS
bash scripts/deploy_to_aws.sh

πŸ“ˆ Monitoring

  • MLflow UI: Track experiments, compare runs
  • Databricks Dashboard: Monitor model performance
  • Kubernetes Dashboard: Monitor container health
  • Cloud Monitoring: Native cloud monitoring tools

πŸ§ͺ Testing

# Run all tests
pytest tests/

# Run specific test
pytest tests/test_model.py

# Generate coverage report
pytest --cov=src tests/

πŸ“ Key Features

MLflow

  • βœ… Experiment tracking
  • βœ… Parameter and metric logging
  • βœ… Artifact storage
  • βœ… Model registry
  • βœ… Model versioning
  • βœ… Model staging (Staging/Production)

Containerization

  • βœ… Docker multi-stage builds
  • βœ… Optimized image size
  • βœ… Health checks
  • βœ… Environment configuration

Kubernetes

  • βœ… Auto-scaling
  • βœ… Load balancing
  • βœ… Rolling updates
  • βœ… Resource management

CI/CD

  • βœ… Automated testing
  • βœ… Continuous training
  • βœ… Automated deployment
  • βœ… Version control integration

πŸ” Security Considerations

  • Secrets management using environment variables
  • Docker image scanning
  • RBAC in Kubernetes
  • API authentication and authorization

πŸ“š Additional Resources

πŸ‘₯ Team Members

  • Member 1: [Name]
  • Member 2: [Name]
  • Member 3: [Name]

πŸ“„ License

This project is for educational purposes as part of an MLOps assessment.

🎯 Quick Reference - Common Tasks

Start Everything Locally

# Using Docker Compose (recommended for development)
docker-compose up -d

# Access at: http://localhost:80

Deploy to Kubernetes

# Apply all manifests
kubectl apply -f k8s/

# Check status
kubectl get pods
kubectl get services

# Access via port-forward
kubectl port-forward service/car-price-frontend-service 8080:80
# Access at: http://localhost:8080

View Logs

# Docker Compose
docker-compose logs -f

# Kubernetes
kubectl logs -l app=car-price-frontend -f
kubectl logs -l app=car-price-proxy -f

Scale Application

# Docker Compose
docker-compose up -d --scale proxy=3

# Kubernetes
kubectl scale deployment car-price-frontend --replicas=5

Update Application

# Docker Compose (rebuild and restart)
docker-compose up -d --build

# Kubernetes (rolling update)
kubectl set image deployment/car-price-frontend frontend=mlops-frontend:v2
kubectl rollout status deployment/car-price-frontend

Clean Up

# Docker Compose (stop and remove)
docker-compose down

# Kubernetes (delete all resources)
kubectl delete -f k8s/

# Complete cleanup
docker-compose down -v  # Remove volumes too
docker system prune -a  # Remove all unused Docker resources

πŸ™ Acknowledgments

  • Dataset from Kaggle
  • Databricks for MLflow platform
  • Open-source community

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors