Predict whether a customer will churn (leave) using the Telco Customer Churn dataset from Kaggle.
- Dataset: Telco Customer Churn (7,043 customers)
- Task: Binary classification (Churn / No Churn)
- Baseline Model: Logistic Regression
- Improved Model: XGBoost
- Best ROC-AUC: 0.8478 (XGBoost)
- Best F1-score: 0.6040 (Logistic Regression)
The project implements a complete ML pipeline including preprocessing, model comparison, evaluation, and error analysis.
telco-churn-ml/
├── data/
│ └── telco.csv # Raw dataset (7,043 customers)
├── notebooks/
│ └── eda.ipynb # Exploratory Data Analysis
├── src/
│ ├── preprocess.py # Data cleaning & feature pipeline
│ ├── train.py # Model training (baseline + improved)
│ └── evaluate.py # Evaluation & error analysis
├── models/
│ ├── logistic_regression.pkl # Baseline model
│ └── xgboost_model.pkl # Improved model
├── results/
│ ├── metrics.json # All evaluation metrics
│ ├── confusion_matrix_*.png # Confusion matrices
│ ├── roc_curve_comparison.png # ROC curve comparison
│ └── feature_importance.png # XGBoost feature importances
├── requirements.txt
└── README.md
All scripts are executable independently and generate models and results automatically.
| Property | Value |
|---|---|
| Source | Kaggle Telco Customer Churn |
| Rows | 7,043 |
| Features | 20 (demographics, services, account info, charges) |
| Target | Churn (Yes / No) |
| Class Balance | ~26.5% churn (imbalanced) |
- Dropped
customerID(not a feature) - Converted
TotalChargesto numeric (11 whitespace entries → filled with median) - Encoded target:
Churn→ 0/1 - Feature pipeline using
ColumnTransformer:- Numeric (
tenure,MonthlyCharges,TotalCharges) →StandardScaler - Categorical (all others) →
OneHotEncoder(handle_unknown="ignore")
- Numeric (
| Parameter | Value |
|---|---|
| Method | train_test_split (scikit-learn) |
| Split Ratio | 80% train / 20% test |
| Stratification | Yes (stratify=y) — essential for imbalanced target |
| Random State | 42 |
- Why? Interpretable, fast, standard baseline for tabular classification.
LogisticRegression(max_iter=1000, random_state=42)
- Why? Handles tabular data extremely well, captures non-linear feature interactions, typically improves recall for minority class.
- Tuned with regularisation (
reg_alpha,reg_lambda,gamma,min_child_weight) to prevent overfitting.
XGBClassifier(
n_estimators=200, max_depth=4, learning_rate=0.05,
subsample=0.8, colsample_bytree=0.8,
reg_alpha=1.0, reg_lambda=5.0,
min_child_weight=5, gamma=0.3,
eval_metric="logloss", random_state=42
)| Best Model | XGBoost (by ROC-AUC) |
| ROC-AUC | 0.8478 |
| Accuracy | 0.8020 |
Note: Logistic Regression achieves a slightly higher F1-score (0.604 vs 0.583) due to better recall, making it competitive. XGBoost leads on ROC-AUC, indicating better overall discriminative ability across all thresholds.
ROC-AUC was chosen as the primary metric because churn prediction involves imbalanced classes, and AUC better reflects model performance across different classification thresholds.
| Metric | Logistic Regression | XGBoost |
|---|---|---|
| Accuracy | 0.8055 | 0.8020 |
| Precision | 0.6572 | 0.6610 |
| Recall | 0.5588 | 0.5214 |
| F1-score | 0.6040 | 0.5830 |
| ROC-AUC | 0.8419 | 0.8478 |
Logistic Regression:
- 209 / 374 churned customers correctly identified (True Positives)
- 165 churned customers missed (False Negatives — business loss: these customers leave undetected)
- 108 non-churn customers flagged incorrectly (False Positives — unnecessary retention cost)
XGBoost:
- 195 / 374 churned customers correctly identified
- 179 churned customers missed (slightly more FN than LR)
- 100 non-churn flagged incorrectly (fewer false alarms)
- Medium-tenure customers (12–36 months) are hardest to classify — they fall between the clear short-tenure churners and loyal long-tenure customers.
- Month-to-month contracts are frequently misclassified — high variability within this group.
- High monthly charges increase churn probability, but some high-charge customers on long contracts remain loyal, confusing the models.
Both models show strong discrimination (AUC > 0.84). XGBoost's curve is slightly higher in the low-FPR region, meaning it's better at identifying true churners when keeping false alarms low.
Top churn predictors:
- tenure — strongest predictor; short tenure = high churn risk
- MonthlyCharges — higher charges correlate with churn
- TotalCharges — proxy for customer value
- Contract (Month-to-month) — highest churn contract type
- InternetService (Fiber optic) — fiber customers churn more than DSL
In real-world deployment, minimizing false negatives is critical since missed churners represent direct revenue loss. Threshold tuning or cost-sensitive learning could further improve recall.
| Insight | Detail |
|---|---|
| 📅 Contract type matters most | Month-to-month contracts have ~42% churn vs <5% for 2-year contracts |
| ⏱️ Short tenure = high risk | Customers in the first 12 months are most likely to churn |
| 💰 Higher charges → higher churn | Median charges for churned customers are significantly higher |
| 🛡️ Protective services help | Customers without Online Security, Tech Support, or Online Backup churn more |
| 🌐 Fiber optic paradox | Despite being a premium service, fiber optic users churn more — possibly due to higher costs |
# 1. Install dependencies
pip install -r requirements.txt
# 2. Train models
python src/train.py
# 3. Evaluate & generate plots
python src/evaluate.py
# 4. Explore EDA notebook
jupyter notebook notebooks/eda.ipynbAll results reported in this README were generated using random_state=42 for reproducibility.
- Python 3.8+
- pandas, numpy, scikit-learn, xgboost, matplotlib, seaborn, joblib, jupyter