Skip to content

Latest commit

 

History

History
292 lines (218 loc) · 10.4 KB

README.md

File metadata and controls

292 lines (218 loc) · 10.4 KB


Logo

SwishPredict

A Data Pipeline for NBA Box Score stats from basketball-reference.com
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

About The Project

Demo

1. Data Pipeline Architecture

  • Uses Apache Airflow for orchestration
  • Docker containers for modular services
  • MinIO for object storage
  • PostgreSQL for data persistence

2. Main Components

  • Data Ingestion Service: Scrapes NBA player box scores from basketball-reference.com
  • Data Processing Service: Processes raw data for model training
  • Feature Generation Service: Prepares features for the ML model
  • This pipeline is made to scrape box score data from basketball-reference for any NBA season with the month ranges October - July. Unusual seasons like 2019-2020 may call for adjustments to the scraping_config.yml config at data_pipeline_services/config/data_ingestion/scraping_config.yml

3. XGBoost (IN PROGRESS)

  • Uses XGBoost for predicting fantasy basketball points
  • Jupyter notebooks for model development and analysis stored in notebooks folder
  • model is stored in models folder
  • Model metrics and configuration are stored in config/model_metadata.yaml:
performance_metrics:
  test_mae: 6.163228800313608
  test_mse: 66.63370955961138
  test_rmse: 8.162947357395574
  test_r2: 0.7177536081562295

(back to top)

Built With

  • Python
  • Airflow
  • XGBoost
  • Docker
  • Pandas

(back to top)

System Architecture

System Architecture Diagram