Skip to content
This repository was archived by the owner on Dec 10, 2024. It is now read-only.

A machine learning project designed to detect and predict fraudulent transactions in a highly imbalanced dataset. The model uses techniques like SMOTE for class balancing and a Keras neural network to classify transactions as legitimate or fraudulent. Achieves an accuracy of 93%, with room for further optimization.

License

Notifications You must be signed in to change notification settings

mkuangdotcom/Fraud_Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fraudulent Transaction Detection Model

Purpose

The rapid development in the financial industry has exposed significant risks, one of the most critical being the detection of fraudulent activities. This project aims to detect and predict fraudulent transactions by analyzing clients' transaction history, spending patterns, and account behaviors to identify anomalies and prevent financial losses.


Dataset

  • Link: Fraud Transaction Detection Dataset
  • Description:
    • The dataset contains 1.75 million transactions generated by simulated users over a period from January 2023 to June 2023.
    • It is highly imbalanced, with only 0.1345% of transactions classified as fraudulent.

Project Overview

1. Data Analysis

Imbalance Issue: The dataset is imbalanced, meaning there are significantly fewer fraudulent transactions compared to legitimate ones. This imbalance could skew the model's performance.

2. Data Preprocessing

To address the imbalance, we apply the Synthetic Minority Over-sampling Technique (SMOTE). SMOTE helps generate synthetic samples for the minority class (fraudulent transactions), improving the model's ability to detect fraud.

Before applying SMOTE, the data is split to ensure that the validation dataset remains untouched, allowing the model to be evaluated on real, unaltered data.

Steps:

  1. Train-Test Split:

    The dataset is divided into training and testing sets using an 80/20 ratio. We apply SMOTE to the training set only to balance the class distribution. The test set remains imbalanced, allowing for unbiased evaluation of model performance on real-world data.

  2. Preprocessing:
    • Min-Max Scaling: All numeric features (e.g., transaction amount) are scaled between 0 and 1 using MinMaxScaler(), ensuring that features with larger ranges don’t dominate the model training process.
    • Categorical Encoding: Categorical features (e.g., customer ID, terminal ID) are encoded using OneHotEncoder(handle_unknown='ignore'), which creates binary columns for each category to help the model understand categorical variables.


3. Model Development

We employ a Keras Sequential Neural Network to classify transactions as fraudulent or legitimate. Here’s the model architecture:


nn_model = keras.Sequential([
    keras.layers.Input(shape=(X_train_resampled.shape[1],)),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(0.2), 
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dropout(0.2), 
    keras.layers.Dense(1, activation='sigmoid')
])

Dropout Layers: Dropout layers help in reducing overfitting, especially in highly imbalanced datasets, by randomly turning off a fraction of neurons during training. This forces the model to learn more robust patterns.


4. Results

Training History

The training and validation accuracy do not show significant divergence or fluctuations, indicating the model is performing well despite the class imbalance.

Confusion Matrix

Classification Report

Metric Precision Recall F1-Score Support
Legitimate Transactions (0) 0.96 0.96 0.96 303,637
Fraudulent Transactions (1) 0.74 0.73 0.74 47,194
Accuracy 0.93
Macro Avg 0.85 0.85 0.85 350,831
Weighted Avg 0.93 0.93 0.93 350,831

Conclusion

The model achieved an accuracy of 93%, with a precision and recall of 0.74 for detecting fraudulent transactions. The results suggest that the model can identify fraudulent activity effectively, but there’s room for improvement, especially in the recall for fraudulent transactions. Fine-tuning the model, trying alternative algorithms, and leveraging more advanced techniques like ensemble learning could further improve the performance.

About

A machine learning project designed to detect and predict fraudulent transactions in a highly imbalanced dataset. The model uses techniques like SMOTE for class balancing and a Keras neural network to classify transactions as legitimate or fraudulent. Achieves an accuracy of 93%, with room for further optimization.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published