The rapid development in the financial industry has exposed significant risks, one of the most critical being the detection of fraudulent activities. This project aims to detect and predict fraudulent transactions by analyzing clients' transaction history, spending patterns, and account behaviors to identify anomalies and prevent financial losses.
- Link: Fraud Transaction Detection Dataset
- Description:
- The dataset contains 1.75 million transactions generated by simulated users over a period from January 2023 to June 2023.
- It is highly imbalanced, with only 0.1345% of transactions classified as fraudulent.
Imbalance Issue: The dataset is imbalanced, meaning there are significantly fewer fraudulent transactions compared to legitimate ones. This imbalance could skew the model's performance.
To address the imbalance, we apply the Synthetic Minority Over-sampling Technique (SMOTE). SMOTE helps generate synthetic samples for the minority class (fraudulent transactions), improving the model's ability to detect fraud.
Before applying SMOTE, the data is split to ensure that the validation dataset remains untouched, allowing the model to be evaluated on real, unaltered data.
- Train-Test Split:
The dataset is divided into training and testing sets using an 80/20 ratio. We apply SMOTE to the training set only to balance the class distribution. The test set remains imbalanced, allowing for unbiased evaluation of model performance on real-world data.
- Preprocessing:
- Min-Max Scaling: All numeric features (e.g., transaction amount) are scaled between 0 and 1 using
MinMaxScaler(), ensuring that features with larger ranges don’t dominate the model training process. - Categorical Encoding: Categorical features (e.g., customer ID, terminal ID) are encoded using
OneHotEncoder(handle_unknown='ignore'), which creates binary columns for each category to help the model understand categorical variables.
- Min-Max Scaling: All numeric features (e.g., transaction amount) are scaled between 0 and 1 using
We employ a Keras Sequential Neural Network to classify transactions as fraudulent or legitimate. Here’s the model architecture:
nn_model = keras.Sequential([
keras.layers.Input(shape=(X_train_resampled.shape[1],)),
keras.layers.Dense(64, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(32, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(1, activation='sigmoid')
])
Dropout Layers: Dropout layers help in reducing overfitting, especially in highly imbalanced datasets, by randomly turning off a fraction of neurons during training. This forces the model to learn more robust patterns.
The training and validation accuracy do not show significant divergence or fluctuations, indicating the model is performing well despite the class imbalance.
| Metric | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Legitimate Transactions (0) | 0.96 | 0.96 | 0.96 | 303,637 |
| Fraudulent Transactions (1) | 0.74 | 0.73 | 0.74 | 47,194 |
| Accuracy | 0.93 | |||
| Macro Avg | 0.85 | 0.85 | 0.85 | 350,831 |
| Weighted Avg | 0.93 | 0.93 | 0.93 | 350,831 |
The model achieved an accuracy of 93%, with a precision and recall of 0.74 for detecting fraudulent transactions. The results suggest that the model can identify fraudulent activity effectively, but there’s room for improvement, especially in the recall for fraudulent transactions. Fine-tuning the model, trying alternative algorithms, and leveraging more advanced techniques like ensemble learning could further improve the performance.



