Customer Churn Prediction

This is a learning project. I explored this dataset as part of a tableau dashboard project from analystbuilder.com (see below).

I also used AI assisted templates for code to review DevOps Concepts. I performed the planning of the project as if I were a project manager creating milestones, issues, PRs, actions, labels, code reviews, etc. It is an ongoing and I plan to expand it as I learn.

End-to-end churn workflow, built incrementally. As of v0.5.0, the project ships a reproducible baseline model CLI (Logistic Regression), artifact persistence, optional sampling, a Monte Carlo harness, and Docker UX fixes. Earlier versions provided Dockerized ETL + validation.

I have explored this dataset using Tableau in a project from analystbuilder.com

Click here to view dashboard and exploratory data analysis project on Tableau

Stack Overview

Developer (Make/Compose)
        |
        v
+-------------------+        +--------------------------+
|   app (Python)    | <----> |   db (PostgreSQL 16)     |
|  docker container |        |   docker container       |
+-------------------+        +--------------------------+
         ^   |
         |   |
         +----------------------------+
         |   reads env from `.env`    |
         | (POSTGRES_*; DATABASE_URL) |
         +----------------------------+

Python 3.11 (lean runtime): SQLAlchemy + psycopg v3, scikit-learn, pandas, matplotlib
PostgreSQL 16 (alpine) with pg_isready healthcheck
Docker Compose for orchestration
Makefile for all workflows (build, load, train, validate, clean, etc.)
pre-commit + GitHub Actions for formatting, linting, and notebook output stripping

Quick Start (Docker + Make only)

# 1) Clone and configure env
git clone https://github.com/jbrdge/churn-prediction.git
cd churn-prediction
cp .env.example .env

# 2) Start the stack
make up

# 3) Verify configuration and DB connectivity
make health

DB endpoints

Inside containers: db:5432
From host (per compose mapping): localhost:5433

Baseline Model (v0.5.0)

This release adds a reproducible Logistic Regression baseline trained on a Parquet features file.

Build features parquet

Use the Kaggle Telco Customer Churn CSV to produce the features file:

# full dataset
make make-features-from-archive

# or a faster sampled parquet for smoke tests
make make-features-from-archive N=500

Train and inspect

# Train baseline (Logistic Regression)
make train-baseline

# Inspect artifacts / metrics
make ls-artifacts
make show-metrics

Artifacts (under artifacts/baseline_v1/):

model.pkl — serialized model
metrics.json — accuracy, precision, recall, F1, ROC-AUC
params.json — hyperparameters, seed, feature lists, sample_n
coefficients.csv — model weights with one-hot expanded names
confusion_matrix.png, roc_curve.png

Sampling for speed

Cap training rows deterministically (before the split):

make train-baseline-sample N=100

Monte Carlo (optional)

Stress-test stability by varying seed and (optionally) the sample size:

make monte-carlo N=50 MC_ITERS=10
make monte-carlo-summary

Pick and promote the best run’s artifacts by a chosen metric (default: roc_auc):

make mc-best
make mc-show-best           # pretty-print best_summary.json
make mc-show-best-metrics   # pretty-print metrics.json from best run
make mc-ls-best             # list files in best run folder

Best-of Artifacts (tracked)

The following images and metrics are committed from the best run for reproducibility:

v0.4.0 — Validation Runbook

End-to-end validation workflow (from DB schema + CSVs → checks).

make e2e      # schema + load + validate
make e2e-v    # verbose variant (includes validate-all)

What “good” looks like

Non-zero row counts in customers and churn_labels
Orphaned labels = 0
Churn snapshot on latest label_date shows true/false split

Roadmap

✅ [0.1.0] Stabilized Baseline — cleanup, legacy notebook archived, changelog
✅ [0.2.0] Repo Structure — standardized Python/SQL layout, env templates
✅ [0.3.0] Docker Compose — Dockerfile + compose (Postgres), health checks, Make targets
✅ [0.4.0] SQL ETL + Validation — schema creation, CSV ingest CLI, validation runbook
✅ [0.5.0] Baseline Model — modeling CLI, artifacts & metrics, sampling, Monte Carlo
⏳ [0.6.0] Tableau Dashboard — publish dashboard; link from README

See details in CHANGELOG.md.

Notebooks Policy

Notebooks are for exploration; production logic lives in src/.
Outputs are stripped automatically (pre-commit nbstripout).
Large or sensitive data should not be committed.

Troubleshooting

Port in use If 5433 is busy on the host, change the mapping in docker-compose.yml:

db:
  ports:
    - "5434:5432"

DB won’t become healthy

make clean && make up

Run quick sanity

make db-ready
make db-psql

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github		.github
artifacts		artifacts
data		data
docs		docs
notebooks		notebooks
scripts		scripts
sql		sql
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile.app		Dockerfile.app
Dockerfile.etl		Dockerfile.etl
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
VERSION		VERSION
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
oryx-build-commands.txt		oryx-build-commands.txt
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
tableau-dashboard-1.png		tableau-dashboard-1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Customer Churn Prediction

Stack Overview

Quick Start (Docker + Make only)

Baseline Model (v0.5.0)

Build features parquet

Train and inspect

Sampling for speed

Monte Carlo (optional)

Best-of Artifacts (tracked)

v0.4.0 — Validation Runbook

Roadmap

Notebooks Policy

Troubleshooting

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

jbrdge/churn-prediction

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction

Stack Overview

Quick Start (Docker + Make only)

Baseline Model (v0.5.0)

Build features parquet

Train and inspect

Sampling for speed

Monte Carlo (optional)

Best-of Artifacts (tracked)

v0.4.0 — Validation Runbook

Roadmap

Notebooks Policy

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages