🧱Model Development
- Processed NYC taxi trip data with engineered features like pickup/dropoff zones and trip distance.
- Tuned XGBoost and Random Forest models using Hyperopt, improving RMSE by ~30%.
- Tracked all experiments, metrics, and artifacts using MLflow.
🧱 Pipeline Orchestration
- Designed a modular ML pipeline using Apache Airflow to automate preprocessing and training stages.
- Configured Docker + CeleryExecutor for reproducible, distributed task execution.
- Used XCom for smooth data passing and monitoring through the DAG.
🧱 Experiment Management
- Logged models to MLflow Model Registry, versioned them, and promoted top models to "Production" stage.
- Enabled consistent experiment comparison and artifact retrieval.
🧱 Deployment
- Deployed the trained model as a REST API using Flask, allowing real-time predictions via user input.
- Containerized the app with Docker to simulate a real-world, always-on service.
# build once
docker build -t ride-duration-prediction-service:v1 .
# run anywhere, no local installs
docker run -it --rm -p 9696:9696 ride-duration-prediction-service:v1
I have all my models saved in a SQLite Db on MLFlow. You can find my run models (XGBoost) in this folder:
Experiment Tracking using MLFlow/mlruns/1
You can serve a specific model run directly using the run ID. Example:
Replace <run_id> with any of your run folder names (e.g., 03825066a5284d1981cb73e97e7e0fd4).
mlflow models serve -m ./mlruns/1/<run_id>/artifacts/models_mlflow --port 9696