Skip to content

Logistic regression-based credit scoring model using public Kaggle data, designed for transparent PD estimation, performance evaluation, and teaching or regulatory use cases.

License

Notifications You must be signed in to change notification settings

Chengyueminga/ProbDefault_LogisticRegression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ProbDefault_LogisticRegression

GitHub stars GitHub forks

A transparent baseline model for default prediction – built for teaching and evaluation.


Overview

This project explores the application of logistic regression to credit scoring, using data from Kaggle’s Home Credit Default Risk competition. The focus is on building a simple and interpretable Probability of Default (PD) model, suitable for classroom instruction and for demonstrating threshold-based tradeoffs in binary classification.

The notebook is structured as a complete teaching module, with well-labeled sections, model interpretation, and visualizations. Code blocks are annotated in an appendix that links back to the main notebook content.


Teaching Objectives

This project is designed for students and early-career analysts interested in:

  • Modeling PD using logistic regression with imbalanced classes
  • Understanding how threshold choice affects precision and recall
  • Using ROC curves and confusion matrices for evaluation
  • Practicing real-world model building from public credit data

Notebook Structure

Section Description
1 Introduction and Teaching Context
2 Dataset Overview and Pedagogical Considerations
3 Building and Interpreting the Credit Scoring Model
4 Conclusion and Teaching Takeaways
Appendix A Code blocks annotated by section for modular use

Model Highlights

  • Transparent logistic model structure
  • Threshold tuning with clear recall/precision tradeoffs
  • Probability plots to simulate loan approval cutoff scenarios
  • ROC curve and AUC for performance tracking

Dataset Citation

Sergey Kharitonov. (2018). Home Credit Default Risk. Kaggle.
https://www.kaggle.com/competitions/home-credit-default-risk


External Links


License

This project is released under the MIT License and intended for educational and non-commercial use.


Disclaimer

This project was conducted in a personal capacity as an independent researcher. All views, analyses, and interpretations presented here are my own and do not represent the views of any current or former employer. The project is for educational and non-commercial use only.


Contribute & Engage

If you find this notebook helpful, feel free to fork the repository or give it a ⭐️.
Feedback and suggestions are always welcome — contributions to improve the educational value are especially appreciated.

About

Logistic regression-based credit scoring model using public Kaggle data, designed for transparent PD estimation, performance evaluation, and teaching or regulatory use cases.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published