This is a learning project. I explored this dataset as part of a tableau dashboard project from analystbuilder.com (see below).
I also used AI assisted templates for code to review DevOps Concepts. I performed the planning of the project as if I were a project manager creating milestones, issues, PRs, actions, labels, code reviews, etc. It is an ongoing and I plan to expand it as I learn.
End-to-end churn workflow, built incrementally. As of v0.5.0, the project ships a reproducible baseline model CLI (Logistic Regression), artifact persistence, optional sampling, a Monte Carlo harness, and Docker UX fixes. Earlier versions provided Dockerized ETL + validation.
I have explored this dataset using Tableau in a project from analystbuilder.com
Click here to view dashboard and exploratory data analysis project on Tableau
Developer (Make/Compose)
|
v
+-------------------+ +--------------------------+
| app (Python) | <----> | db (PostgreSQL 16) |
| docker container | | docker container |
+-------------------+ +--------------------------+
^ |
| |
+----------------------------+
| reads env from `.env` |
| (POSTGRES_*; DATABASE_URL) |
+----------------------------+
- Python 3.11 (lean runtime): SQLAlchemy + psycopg v3, scikit-learn, pandas, matplotlib
- PostgreSQL 16 (alpine) with
pg_isreadyhealthcheck - Docker Compose for orchestration
- Makefile for all workflows (build, load, train, validate, clean, etc.)
- pre-commit + GitHub Actions for formatting, linting, and notebook output stripping
# 1) Clone and configure env
git clone https://github.com/jbrdge/churn-prediction.git
cd churn-prediction
cp .env.example .env
# 2) Start the stack
make up
# 3) Verify configuration and DB connectivity
make healthDB endpoints
- Inside containers:
db:5432 - From host (per compose mapping):
localhost:5433
This release adds a reproducible Logistic Regression baseline trained on a Parquet features file.
Use the Kaggle Telco Customer Churn CSV to produce the features file:
# full dataset
make make-features-from-archive
# or a faster sampled parquet for smoke tests
make make-features-from-archive N=500# Train baseline (Logistic Regression)
make train-baseline
# Inspect artifacts / metrics
make ls-artifacts
make show-metricsArtifacts (under artifacts/baseline_v1/):
model.pkl— serialized modelmetrics.json— accuracy, precision, recall, F1, ROC-AUCparams.json— hyperparameters, seed, feature lists, sample_ncoefficients.csv— model weights with one-hot expanded namesconfusion_matrix.png,roc_curve.png
Cap training rows deterministically (before the split):
make train-baseline-sample N=100Stress-test stability by varying seed and (optionally) the sample size:
make monte-carlo N=50 MC_ITERS=10
make monte-carlo-summaryPick and promote the best run’s artifacts by a chosen metric (default: roc_auc):
make mc-best
make mc-show-best # pretty-print best_summary.json
make mc-show-best-metrics # pretty-print metrics.json from best run
make mc-ls-best # list files in best run folderThe following images and metrics are committed from the best run for reproducibility:
End-to-end validation workflow (from DB schema + CSVs → checks).
make e2e # schema + load + validate
make e2e-v # verbose variant (includes validate-all)What “good” looks like
- Non-zero row counts in
customersandchurn_labels - Orphaned labels = 0
- Churn snapshot on latest
label_dateshows true/false split
- ✅ [0.1.0] Stabilized Baseline — cleanup, legacy notebook archived, changelog
- ✅ [0.2.0] Repo Structure — standardized Python/SQL layout, env templates
- ✅ [0.3.0] Docker Compose — Dockerfile + compose (Postgres), health checks, Make targets
- ✅ [0.4.0] SQL ETL + Validation — schema creation, CSV ingest CLI, validation runbook
- ✅ [0.5.0] Baseline Model — modeling CLI, artifacts & metrics, sampling, Monte Carlo
- ⏳ [0.6.0] Tableau Dashboard — publish dashboard; link from README
See details in CHANGELOG.md.
- Notebooks are for exploration; production logic lives in
src/. - Outputs are stripped automatically (pre-commit
nbstripout). - Large or sensitive data should not be committed.
Port in use
If 5433 is busy on the host, change the mapping in docker-compose.yml:
db:
ports:
- "5434:5432"DB won’t become healthy
make clean && make upRun quick sanity
make db-ready
make db-psqlMIT — see LICENSE.


