Decifra

Fraud Detection MLOps Pipeline with Explainable AI

Decifra is a production-ready machine learning operations (MLOps) pipeline designed for credit card fraud detection. The system not only identifies fraudulent transactions with high accuracy but also provides transparent, interpretable explanations for each prediction using state-of-the-art Explainable AI (XAI) techniques.

The name "Decifra" derives from "to decipher" - representing the project's core mission of uncovering hidden fraud patterns and making AI decisions transparent and understandable.

Model Performance

Metric	Before Tuning	After Optuna (100 trials)
PR-AUC	84.91%	88.52%
Precision	39.91%	87%
Recall	86.73%	86%
F1 Score	54.66%	86%

🤗 Live Demo on Hugging Face

Dashboard Preview

Dashboard Interface - Real-time fraud detection with explainability

Model Performance Metrics - PR-AUC: 88.52%

Technology Stack Overview

Model Performance Comparison

Architecture

End-to-end MLOps Pipeline Architecture

Quick Start

Option 1: Docker (Recommended)

# Clone repository
git clone https://github.com/HarshTomar1234/decifra.git
cd decifra

# Run with Docker Compose
docker-compose up --build

# Access:
# - Dashboard: http://localhost:8501
# - API: http://localhost:3000
# - MLflow: http://localhost:5000

Option 2: Local Setup

# Clone and setup
git clone https://github.com/HarshTomar1234/decifra.git
cd decifra
python -m venv .venv
.venv\Scripts\activate  # Windows
pip install -e ".[dev]"

# Run training pipeline
python -m pipelines.training_pipeline

# Start BentoML API
python -m src.serving.save_model
bentoml serve src.serving.service:FraudDetectorService

# Launch Dashboard
streamlit run dashboard/app.py

Overview

Financial fraud detection is a critical challenge in the banking and fintech industry. Traditional ML models often operate as "black boxes," providing predictions without explanations. This lack of transparency creates challenges for:

Regulatory Compliance: Regulations like GDPR require explanations for automated decisions
Fraud Analyst Trust: Investigators need to understand why transactions are flagged
Model Debugging: Data scientists need insights to improve model performance
Customer Experience: Reducing false positives requires understanding model behavior

Decifra addresses these challenges by combining robust fraud detection with comprehensive explainability.

Key Features

Machine Learning

Multi-model training with XGBoost, LightGBM, and Random Forest
Automated hyperparameter optimization using Optuna
Handling of highly imbalanced datasets using SMOTE and stratified sampling
Ensemble methods for improved prediction accuracy

Explainable AI (XAI)

SHAP (SHapley Additive exPlanations) for global and local feature importance
LIME (Local Interpretable Model-agnostic Explanations) for instance-level explanations
Visual explanation reports for each prediction
Feature contribution analysis for fraud decisions

MLOps Infrastructure

End-to-end pipeline orchestration with ZenML
Experiment tracking and model registry with MLflow
Data versioning with DVC
Model serving and API deployment with BentoML
Data validation with Great Expectations

Production Ready

RESTful API for real-time predictions
Interactive Streamlit dashboard for monitoring
Docker containerization for deployment
CI/CD pipeline with GitHub Actions
Comprehensive logging and monitoring

Technology Stack

Category	Technology	Purpose
Orchestration	ZenML	ML pipeline orchestration and workflow management
Experiment Tracking	MLflow	Experiment logging, model registry, and artifact storage
Data Versioning	DVC	Version control for datasets and model artifacts
Model Serving	BentoML	Model packaging and REST API deployment
Explainability	SHAP, LIME	Model interpretation and explanation generation
ML Models	XGBoost, LightGBM, scikit-learn	Gradient boosting and ensemble methods
Hyperparameter Tuning	Optuna	Bayesian optimization for hyperparameters
Data Validation	Great Expectations	Data quality checks and schema validation
Dashboard	Streamlit	Interactive web interface for monitoring
API Framework	FastAPI	High-performance REST API endpoints
Containerization	Docker	Application containerization and deployment

Project Architecture

                                    +-------------------+
                                    |   Streamlit       |
                                    |   Dashboard       |
                                    +--------+----------+
                                             |
+------------------+    +-------------------+|+-------------------+
|                  |    |                   |||                   |
|  Raw Data        +--->+  ZenML Pipeline   +-->  MLflow          |
|  (DVC Tracked)   |    |                   |||  Tracking         |
|                  |    +-------------------+|+-------------------+
+------------------+             |           |
                                 |           |
                    +------------+           +------------+
                    |                                     |
           +--------v--------+                   +--------v--------+
           |                 |                   |                 |
           |  Trained Model  |                   |  BentoML API    |
           |  (MLflow)       +------------------>+  Service        |
           |                 |                   |                 |
           +-----------------+                   +--------+--------+
                                                          |
                                                 +--------v--------+
                                                 |                 |
                                                 |  SHAP / LIME    |
                                                 |  Explanations   |
                                                 |                 |
                                                 +-----------------+

Project Structure

decifra/
├── src/                          # Source code
│   ├── config.py                 # Config loader (reads from configs/)
│   ├── data/                     # Data ingestion and preprocessing
│   │   ├── __init__.py
│   │   ├── ingestion.py          # Data loading from sources
│   │   ├── preprocessing.py      # Feature scaling, encoding
│   │   └── validation.py         # Data quality checks
│   ├── features/                 # Feature engineering
│   │   ├── __init__.py
│   │   └── engineering.py        # Feature transformations
│   ├── models/                   # Model definitions
│   │   ├── __init__.py
│   │   └── tuner.py              # Optuna hyperparameter tuning
│   ├── explainability/           # XAI implementations
│   │   ├── __init__.py
│   │   ├── shap_explainer.py     # SHAP explanations
│   │   └── lime_explainer.py     # LIME explanations
│   └── serving/                  # Model serving
│       ├── __init__.py
│       ├── service.py            # BentoML service
│       └── save_model.py         # Model export to BentoML
├── pipelines/                    # ZenML pipelines
│   ├── __init__.py
│   ├── training_pipeline.py      # Training workflow
│   └── steps/                    # Pipeline steps
│       ├── __init__.py
│       ├── data_loader.py
│       ├── preprocessor.py
│       ├── trainer.py
│       ├── evaluator.py
│       ├── explainer.py
│       └── tuner.py              # Hyperparameter tuning step
├── dashboard/                    # Streamlit application
│   └── app.py                    # Main dashboard
├── configs/                      # Configuration files
│   └── config.yaml               # Hyperparameters & settings (SINGLE SOURCE OF TRUTH)
├── data/                         # Data directory
│   ├── raw/                      # Raw datasets
│   └── processed/                # Processed datasets
├── artifacts/                    # Generated artifacts
│   ├── models/                   # Trained models (including tuned)
│   └── explanations/             # SHAP & LIME plots
├── notes/                        # Learning notes
│   ├── 01_zenml_pipelines.md
│   ├── 02_mlflow_tracking.md
│   ├── 03_bentoml_serving.md
│   ├── 04_streamlit_dashboard.md
│   └── 05_optuna_tuning.md
├── notebooks/                    # Jupyter notebooks
├── tests/                        # Unit and integration tests
├── docker/                       # Docker configurations
├── .dvc/                         # DVC configuration
├── .zen/                         # ZenML configuration
├── pyproject.toml                # Project dependencies
├── bentofile.yaml                # BentoML build configuration
├── .env.example                  # Environment template
└── README.md                     # This file

Installation

Prerequisites

Python 3.9 or higher
Git
pip or conda

Setup

Clone the repository:

git clone https://github.com/HarshTomar1234/decifra.git
cd decifra

Create and activate a virtual environment:

python -m venv .venv

# Windows
.venv\Scripts\activate

# Linux/macOS
source .venv/bin/activate

Install dependencies:

pip install -e ".[dev]"

Initialize ZenML:

zenml init

Configure environment variables:

cp .env.example .env
# Edit .env with your configuration

Configuration

The main configuration file is located at configs/config.yaml. Key settings include:

Data Settings

data:
  raw_path: "data/raw"
  processed_path: "data/processed"
  test_size: 0.2
  random_state: 42

Model Settings

models:
  xgboost:
    n_estimators: 100
    max_depth: 6
    learning_rate: 0.1

Explainability Settings

explainability:
  shap:
    max_display: 20
  lime:
    num_features: 10

Usage

Training Pipeline

Run the complete training pipeline:

python -m pipelines.training_pipeline

MLflow UI

View experiment tracking results:

mlflow ui --port 5000

Access at: http://localhost:5000

Start API Server

First, save the model to BentoML:

python -m src.serving.save_model

Then deploy the model as a REST API:

bentoml serve src.serving.service:FraudDetectorService

Access API docs at: http://localhost:3000

Launch Dashboard

Start the Streamlit monitoring dashboard:

streamlit run dashboard/app.py

MLOps Pipeline

The ZenML training pipeline consists of the following steps:

Data Ingestion: Load raw transaction data
Data Validation: Verify data quality using Great Expectations
Preprocessing: Handle missing values, scale features, apply SMOTE
Feature Engineering: Create derived features
Model Training: Train multiple models with hyperparameter tuning
Model Evaluation: Calculate metrics (Precision, Recall, F1, ROC-AUC, PR-AUC)
Model Selection: Select best performing model
Explainability: Generate SHAP values and feature importance
Model Registration: Register model in MLflow

Explainability

SHAP Explanations

SHAP provides both global and local explanations:

Global: Overall feature importance across all predictions
Local: Per-prediction feature contributions

LIME Explanations

LIME generates human-readable explanations by approximating model behavior locally around each prediction.

Example Output

For a flagged transaction, the system provides:

Fraud probability score
Top contributing features (positive and negative)
Visual waterfall chart of feature contributions
Natural language explanation

API Reference

Endpoints

Method	Endpoint	Description
POST	`/predict`	Get fraud prediction
POST	`/predict_with_explanation`	Get prediction with SHAP/LIME explanation
GET	`/health`	Health check
GET	`/model_info`	Model metadata

Request Example

curl -X POST http://localhost:3000/predict \
  -H "Content-Type: application/json" \
  -d '{"features": [...]}'

Response Example

{
  "prediction": 1,
  "probability": 0.87,
  "is_fraud": true,
  "explanation": {
    "top_features": [
      {"feature": "V14", "contribution": 0.23},
      {"feature": "V4", "contribution": 0.18}
    ]
  }
}

Dashboard

The Streamlit dashboard provides:

Overview: Model performance metrics and confusion matrix
Predictions: Real-time fraud scoring interface
Explanations: Interactive SHAP and LIME visualizations
Monitoring: Data drift detection and prediction distribution
Analytics: Historical model performance trends

Testing

Run the test suite:

pytest tests/ -v

Run with coverage:

pytest tests/ --cov=src --cov-report=html

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/new-feature)
Commit your changes (git commit -m 'Add new feature')
Push to the branch (git push origin feature/new-feature)
Open a Pull Request

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Credit Card Fraud Detection Dataset from Kaggle
ZenML, MLflow, DVC, and BentoML teams for excellent MLOps tools
SHAP and LIME authors for XAI libraries

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.dvc		.dvc
.github/workflows		.github/workflows
artifacts		artifacts
configs		configs
dashboard		dashboard
data		data
deployment/huggingface		deployment/huggingface
docker		docker
notebooks		notebooks
notes		notes
pipelines		pipelines
src		src
tests		tests
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bentofile.yaml		bentofile.yaml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-serving.txt		requirements-serving.txt

Folders and files

Latest commit

History

Repository files navigation

Decifra

Model Performance

Dashboard Preview

Architecture

Table of Contents

Quick Start

Option 1: Docker (Recommended)

Option 2: Local Setup

Overview

Key Features

Machine Learning

Explainable AI (XAI)

MLOps Infrastructure

Production Ready

Technology Stack

Project Architecture

Project Structure

Installation

Prerequisites

Setup

Configuration

Data Settings

Model Settings

Explainability Settings

Usage

Training Pipeline

MLflow UI

Start API Server

Launch Dashboard

MLOps Pipeline

Explainability

SHAP Explanations

LIME Explanations

Example Output

API Reference

Endpoints

Request Example

Response Example

Dashboard

Testing

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages