A Data Pipeline for NBA Box Score stats from basketball-reference.com
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
Table of Contents
- Uses Apache Airflow for orchestration
- Docker containers for modular services
- MinIO for object storage
- PostgreSQL for data persistence
- Data Ingestion Service: Scrapes NBA player box scores from basketball-reference.com
- Data Processing Service: Processes raw data for model training
- Feature Generation Service: Prepares features for the ML model
- This pipeline is made to scrape box score data from basketball-reference for any NBA season with the month ranges October - July. Unusual seasons like 2019-2020 may call for adjustments to the
scraping_config.yml
config atdata_pipeline_services/config/data_ingestion/scraping_config.yml
- Uses XGBoost for predicting fantasy basketball points
- Jupyter notebooks for model development and analysis stored in
notebooks
folder - model is stored in
models
folder - Model metrics and configuration are stored in
config/model_metadata.yaml
:
performance_metrics:
test_mae: 6.163228800313608
test_mse: 66.63370955961138
test_rmse: 8.162947357395574
test_r2: 0.7177536081562295