A containerized fraud detection system that leverages machine learning to identify fraudulent credit card transactions. This project demonstrates a complete data science pipeline—from data processing and model training to real-time API monitoring and BI visualization using Apache Superset.
- Overview
- Features
- Repository Structure
- Installation
- Usage
- API Endpoints
- Architecture
- Testing
- License
- Contributing
- Contact
This project is a comprehensive solution for fraud detection in credit card transactions. It includes:
- Data Processing: Loading, cleaning, and preprocessing data.
- Model Training: Using a RandomForest classifier for fraud detection.
- Streaming Simulation: Generating synthetic transaction data and simulating a live data stream.
- API Endpoints: Providing endpoints to monitor system status, metrics, statistics, and transactions.
- Visualization: Leveraging Apache Superset for interactive dashboarding and data exploration.
- Containerization: Packaging all services using Docker and orchestrating them with Docker Compose.
- Machine Learning Model: Trained with historical data to detect fraudulent transactions.
- Real-time Data Processing: Simulates a continuous stream of transactions with on-the-fly model updates.
- REST API: Exposes endpoints for tracking model metrics and transaction logs.
- BI Dashboard: Integrated Apache Superset instance for visual exploration of transaction data.
- Modular Codebase: Structured into multiple modules for ease of testing and scaling.
- Containerized Deployment: Easily deployable with Docker and Docker Compose.
fraud-detection-system/
├── app/
│ ├── main.py # Main application code (Flask server)
│ ├── utils/ # Utility modules
│ │ ├── data_processing.py # Data preprocessing functions
│ │ ├── model.py # Machine learning model functions
│ │ └── metrics.py # Model metric calculation and logging
├── data/
│ └── test_creditcard_2023.csv # Demo dataset for testing
├── docs/
│ ├── model_explanation.md # Explanation of the ML model
│ ├── api_reference.md # API documentation
│ └── system_architecture.md # System architecture overview
├── notebooks/
│ ├── exploratory_analysis.ipynb # Exploratory data analysis of transactions
│ └── model_development.ipynb # Model development and testing
├── superset/
│ ├── Dockerfile # Dockerfile for Superset service
│ └── superset_config.py # Superset configuration
├── tests/ # Unit tests for the project
├── .env.example # Template for environment variables
├── .gitignore
├── docker-compose.yml # Docker Compose configuration for all services
├── Dockerfile # Dockerfile for the Flask application
├── LICENSE # Project license (e.g., MIT)
├── README.md # Project README (this file)
└── requirements.txt # Python dependencies
-
Clone the Repository:
git clone https://github.com/yourusername/fraud-detection-system.git cd fraud-detection-system
-
Configure Environment Variables:
- Copy the environment variables template:
cp .env.example .env
- Modify the
.env
file if necessary.
- Copy the environment variables template:
-
Build and Run with Docker Compose:
docker-compose up --build
This command starts the following services:
- db: PostgreSQL database.
- web: Flask application running on port 8000.
- superset: Apache Superset dashboard running on port 8088.
-
Flask API:
- Check the app status at: http://localhost:8000/
- The app simulates real-time data ingestion and updates model metrics.
-
Apache Superset:
- Access the Superset dashboard at: http://localhost:8088/
- Use Superset to build interactive dashboards and explore transaction data.
-
GET /
Returns the application status and current server time.{ "status": "running", "time": "2023-10-XXTXX:XX:XX" }
-
GET /metrics
Retrieves the latest model performance metrics (precision, recall, F1 score, and accuracy). -
GET /statistics
Provides detailed statistics (mean and standard deviation) for each feature. -
GET /transactions
Lists the latest 100 transactions, including actual and predicted classes.
For more details, refer to docs/api_reference.md.
The system is composed of several modular components:
-
Data Processing & Model Training:
Located inapp/utils/
, these modules handle data cleaning, feature scaling, model training, and predictions. -
Flask Application:
Acts as the API gateway for handling data streams, processing transactions, and logging to the database. -
Database:
A PostgreSQL database stores transaction logs, model metrics, and data statistics. -
BI Dashboard:
Apache Superset (configured in thesuperset/
directory) provides interactive data visualization capabilities. -
Containerization:
Docker and Docker Compose ensure environment consistency and ease of deployment.
For additional details, see docs/system_architecture.md.
Unit tests are provided in the tests/
directory. To run the tests:
# Install dependencies if not already installed
pip install -r requirements.txt
# Run all tests using unittest
python -m unittest discover -s tests
This project is licensed under the MIT License. See the LICENSE file for details.
For any questions or feedback, please contact:
Uladzimir Manulenka
[email protected]