Fraudulent Transaction Detection Model

Purpose

The rapid development in the financial industry has exposed significant risks, one of the most critical being the detection of fraudulent activities. This project aims to detect and predict fraudulent transactions by analyzing clients' transaction history, spending patterns, and account behaviors to identify anomalies and prevent financial losses.

Dataset

Link: Fraud Transaction Detection Dataset
Description:
- The dataset contains 1.75 million transactions generated by simulated users over a period from January 2023 to June 2023.
- It is highly imbalanced, with only 0.1345% of transactions classified as fraudulent.

Project Overview

1. Data Analysis

Imbalance Issue: The dataset is imbalanced, meaning there are significantly fewer fraudulent transactions compared to legitimate ones. This imbalance could skew the model's performance.

2. Data Preprocessing

To address the imbalance, we apply the Synthetic Minority Over-sampling Technique (SMOTE). SMOTE helps generate synthetic samples for the minority class (fraudulent transactions), improving the model's ability to detect fraud.

Before applying SMOTE, the data is split to ensure that the validation dataset remains untouched, allowing the model to be evaluated on real, unaltered data.

Steps:

Train-Test Split:
The dataset is divided into training and testing sets using an 80/20 ratio. We apply SMOTE to the training set only to balance the class distribution. The test set remains imbalanced, allowing for unbiased evaluation of model performance on real-world data.
Preprocessing:
- Min-Max Scaling: All numeric features (e.g., transaction amount) are scaled between 0 and 1 using MinMaxScaler(), ensuring that features with larger ranges don’t dominate the model training process.
- Categorical Encoding: Categorical features (e.g., customer ID, terminal ID) are encoded using OneHotEncoder(handle_unknown='ignore'), which creates binary columns for each category to help the model understand categorical variables.

3. Model Development

We employ a Keras Sequential Neural Network to classify transactions as fraudulent or legitimate. Here’s the model architecture:


nn_model = keras.Sequential([
    keras.layers.Input(shape=(X_train_resampled.shape[1],)),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(0.2), 
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dropout(0.2), 
    keras.layers.Dense(1, activation='sigmoid')
])

Dropout Layers: Dropout layers help in reducing overfitting, especially in highly imbalanced datasets, by randomly turning off a fraction of neurons during training. This forces the model to learn more robust patterns.

4. Results

Training History

The training and validation accuracy do not show significant divergence or fluctuations, indicating the model is performing well despite the class imbalance.

Confusion Matrix

Classification Report

Metric	Precision	Recall	F1-Score	Support
Legitimate Transactions (0)	0.96	0.96	0.96	303,637
Fraudulent Transactions (1)	0.74	0.73	0.74	47,194
Accuracy	0.93
Macro Avg	0.85	0.85	0.85	350,831
Weighted Avg	0.93	0.93	0.93	350,831

Conclusion

The model achieved an accuracy of 93%, with a precision and recall of 0.74 for detecting fraudulent transactions. The results suggest that the model can identify fraudulent activity effectively, but there’s room for improvement, especially in the recall for fraudulent transactions. Fine-tuning the model, trying alternative algorithms, and leveraging more advanced techniques like ensemble learning could further improve the performance.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
model.ipynb		model.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fraudulent Transaction Detection Model

Purpose

Dataset

Project Overview

1. Data Analysis

2. Data Preprocessing

Steps:

3. Model Development

4. Results

Training History

Confusion Matrix

Classification Report

Conclusion

About

Uh oh!

Releases

Packages

Languages

License

mkuangdotcom/Fraud_Detection

Folders and files

Latest commit

History

Repository files navigation

Fraudulent Transaction Detection Model

Purpose

Dataset

Project Overview

1. Data Analysis

2. Data Preprocessing

Steps:

3. Model Development

4. Results

Training History

Confusion Matrix

Classification Report

Conclusion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages