Skip to content

Latest commit

 

History

History
101 lines (61 loc) · 3.86 KB

File metadata and controls

101 lines (61 loc) · 3.86 KB

Heart-Failure-Prediction-Using-Machine-Learning-Classification-Algorithms

Dataset

https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction

Preprocessing Pipeline

  1. Numerical features statistics.
  2. Missing values detection.
  3. Duplicates detection.
  4. Target balance detection.

Target Imbalance

  1. Outliers detection and handling.

A) Before handling: Outliers before handling

B) After handling: Outliers after handling

  1. Distribution of numerical and categorical features.

A) Numerical features: Numerical features

B) Categorical features: Categorical features

Data Visualization Pipeline

  1. Histogram of features between classes.

A) Numerical features:

Distribution of features between classes - numerical

B) Categorical features:

Distribution of features between classes - categorical

  1. Pairplot of features.

Pairplot of features

  1. Correlation between features.

correlation between features

Feature Scaling Pipeline

  1. Quantization of categorical features.

Quantization of features

  1. Dataset splitting into training, validation and testing.
  2. Feature scaling step.

Feature Extraction Pipeline

Backward Sequential Feature Selection

  1. Performing backward SFS.
  2. GridSearchCV for finding best hyperparameters.
  3. Validating models.

SFS matrix

SFS boxplot

Principal Component Analysis

  1. Choosing best number of principal components.

number of pcs

  1. Data Transformation.
  2. GridSearchCV for finding best hyperparameters.
  3. Validating models.

PCA matrix

PCA boxplot

Training and Testing Pipeline

  1. Training and testing chosen models.
  2. Plotting testing results.

Testing matrix

Testing boxplot

  1. Visualization of predictions.

A) RF predictions using PCA:

PCA predictions

B) RF predictions using SFS:

SFS predictions

  1. Confusion matrix.

confusion matrix