Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 77 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,91 @@
# Intelli-Credit - Intelligent Corporate Underwriting
# 🏦 Intelli-Credit - Intelligent Corporate Underwriting

> An autonomous AI Credit Officer designed to simulate how Tier-1 bank credit committees operate.
> An autonomous AI Credit Officer platform designed to automate Tier-1 bank credit committee operations, leveraging Multi-Agent architecture, specialized Financial LLMs, and explainable risk scoring.

Intelli-Credit is an end-to-end B2B credit decisioning platform. It ingests structured and unstructured borrower financial data, conducts autonomous web-scale due diligence, computes an explainable composite risk score using Machine Learning ensembles, simulates stress tests evaluating RAROC capital impact, and automatically generates a structured, downloadable Credit Appraisal Memo (CAM) in PDF format.
Intelli-Credit is an end-to-end B2B credit decisioning engine. It ingests complex financial documents, conducts autonomous due diligence via specialized agents, computes risk scores using machine learning ensembles, and generates professional Credit Appraisal Memos (CAM).

## 🔥 Key Differentiators
---

1. **"AI Credit Officer" Persona & LLM Integration**: The system isn't just a traditional ML pipeline. It uses **Google Gemini** to extract unstructured data, read financial PDFs, and automatically author conversational narrative summaries for risk and compliance.
2. **Web-Scale Research Simulation**: Includes NLP sentiment analysis (using FinBERT), regulatory filings intelligence, and ESG scores.
3. **Modular Decision Studio**: A dynamic workflow engine that allows credit risk managers to visually construct underwriting logic, configure dynamic scoring rules, and trigger external webhooks interactively.
4. **Capital Impact (RAROC) Simulation**: Elevates from basic scoring to bank portfolio management by assessing Risk-Weighted Assets (RWA) and tier capital requirements.
5. **SHAP-Based Explainability**: Avoids "black box" models. The top contributing risk drivers are extracted for every decision.
## 🏗 Project Architecture

## 🏗 System Architecture
Intelli-Credit is built with a modern, decoupled architecture designed for high performance and scalability.

The project consists of a Python FastAPI backend acting as the Machine Learning, LLM, and pipeline orchestration layer, paired with a modern Next.js frontend featuring real-time state synchronization, drag-and-drop workflow canvases, and Firebase authentication.
- **Frontend**: A highly interactive **Next.js 16** (App Router) application built with **React 19** and **Tailwind CSS 4**. It features a "Decision Studio" built on **XYFlow** for visual policy orchestration.
- **Backend**: A high-performance **FastAPI** service powered by **Python 3.11+**, utilizing **Async SQLAlchemy** for non-blocking database operations and **Google Gemini 1.5 Flash** for document intelligence.
- **AI Ecosystem**: Employs a multi-tier AI strategy using **Camel-AI** for multi-agent workflows, **Mem0** for persistent agent memory, and **XGBoost/SHAP** for transparent credit scoring.
- **Data Layer**: Integrates with **Databricks SQL Warehouse** for enterprise-grade data ingestion and **FAISS** for vector search capabilities.

### Core Modules
* **Ingestion Engine**: Parses financial PDFs, Bureau JSONs, and Bank Statement CSVs (using Gemini Vision & regex).
* **Dynamic Scorer & Rules Engine**: Evaluates nested risk rules built via the Decision Studio UI.
* **LLM Research Agent**: Leverages a LangChain-powered agent to perform RAG-based Vector Search and external intelligence aggregation.
* **Risk Synthesis**: Combines ML Probability of Default (Gradient Boosting), qualitative LLM summaries, and macro-economic factors.
---

## 🚀 Quick Start (Local Development)
## 📂 Project Structure

```bash
intelli-credit/
├── frontend/ # Next.js 16 + React 19 Frontend
│ ├── src/
│ │ ├── app/ # App Router pages and layouts
│ │ ├── components/ # Reusable UI components (Tailwind 4)
│ │ ├── store/ # Zustand state management
│ │ └── hooks/ # Custom React hooks
├── backend/ # FastAPI + Python Backend
│ ├── modules/ # Core logic: Ingestion, Scoring, Agents
│ ├── routers/ # API endpoints (V2 Async supported)
│ ├── schemas/ # Pydantic data models
│ ├── database/ # SQLAlchemy models and migrations
│ ├── security/ # Firebase Auth integration
│ └── training/ # ML model training scripts
├── docker-compose.yml # Container orchestration
└── architecture.md # Technical Deep-Dive
```

---

## 🔥 Key Features

### 1. 🧠 Intelligent Ingestion Engine
Uses **Gemini 1.5 Flash** & **OCR (Tesseract/pdfplumber)** to extract structured financial data from messy, scanned Indian corporate PDFs, including:
- Schedule III Balance Sheets & Profit/Loss statements.
- GST Filings (GSTR-1, 3B) linked to **Databricks**.
- Bank Statements with automated transaction categorization.

### 2. 🎨 Decision Studio (Visual Policy Engine)
A drag-and-drop canvas powered by **XYFlow** that allows risk managers to:
- Build nested credit policies without writing code.
- Trigger external webhooks and data integrations.
- Define dynamic rules using a secure Python AST execution engine.

### 3. 🤖 Multi-Agent Due Diligence
Deploys a swarm of autonomous agents using **Camel-AI** and **Mem0**:
- **Searcher Agent**: Conducts web-scale adverse media and regulatory searches.
- **Analyst Agent**: Synthesizes financial ratios and macro-economic factors.
- **Summarizer Agent**: Authors high-quality narrative commentary for the CAM.

### 4. 📊 Explainable Risk Scoring (SHAP)
Avoids "black box" decisions by providing full transparency:
- **XGBoost Ensembles**: Predicts probability of default with high accuracy.
- **SHAP Interpretability**: Visualizes exactly which factors (e.g., DSCR, Current Ratio) drove the final decision.
- **RAROC Simulation**: Estimates Risk-Adjusted Return on Capital and capital impact.

---

## 🛠 Tech Stack

- **Frontend**: Next.js 16, React 19, Tailwind CSS 4, XYFlow, Zustand, Recharts, Framer Motion.
- **Backend**: FastAPI, SQLAlchemy 2.0, Pydantic, ReportLab, Celery (Optional).
- **AI/ML**: Google Gemini 1.5 Flash, XGBoost, SHAP, Camel-AI, Mem0, FinBERT (Sentiment).
- **Data & Auth**: PostgreSQL/SQLite, Databricks, Firebase Auth (Identity Platform).

---

## 🚀 Quick Start

### 1. Backend Setup
```bash
cd backend
python -m venv venv
venv\Scripts\activate # On Windows
source venv/bin/activate # venv\Scripts\activate on Windows
pip install -r requirements.txt
```
*Note: A `.env` file is required in the backend containing your `GEMINI_API_KEY`, `POSTGRES_USER`, and database strings for Alembic migrations.*

Run the FastAPI server:
```bash
uvicorn main:app --reload --port 8000
uvicorn main:app --reload
```

### 2. Frontend Setup
Expand All @@ -44,24 +94,8 @@ cd frontend
npm install
npm run dev
```
*Note: Ensure your `.env.local` contains valid Firebase configuration keys (`NEXT_PUBLIC_FIREBASE_API_KEY`, etc.) for user authentication to function.*

Access the platform at `http://localhost:3000`.

## 🧠 Using the Platform

1. **Authenticate**: Use the Firebase login page to sign in to the dashboard.
2. **Upload & Ingest**: Go to the New Proposal flow. Upload a financial PDF or Bureau data. The system uses Gemini Vision for intelligent OCR.
3. **Build Workflows**: Use the **Decision Studio** to visually drag and drop Risk Policies and Decision Nodes.
4. **Review the Output**:
- Observe the final decision (APPROVE / CONDITIONAL / REJECT).
- Review the Stress Test simulator and SHAP charts.
- Check the Governance Audit Trail.
- Click **"Generate CAM"** to receive the final professionally formatted Credit Appraisal Memo PDF.

## 🛠 Tech Stack
---

- **Machine Learning & AI**: Scikit-Learn (Gradient Boosting), SHAP, HuggingFace (`ProsusAI/finbert`), Google Gemini API, LangChain, FAISS (Vector DB)
- **Backend API**: Python 3.11, FastAPI, Uvicorn, PostgreSQL (with asyncpg & Alembic), ReportLab
- **Frontend App**: Next.js (App Router), React 18, TailwindCSS, Recharts, React Flow (Nodes), Zustand (State Management), Firebase Auth
- **Infra**: Context-driven REST APIs, Webhooks, Docker (Optional)
## 📜 Documentation
For a deeper dive into the system design, check out [architecture.md](file:///d:/Hackathons/intelli-credit/architecture.md).
12 changes: 9 additions & 3 deletions backend/async_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,21 @@

ASYNC_DATABASE_URL = os.getenv(
"ASYNC_DATABASE_URL",
"postgresql+asyncpg://postgres:postgres@localhost:5432/intelli_credit",
"sqlite+aiosqlite:///./intelli_credit_async.db",
)

# Detect if we should use SQLite (default or explicit)
is_sqlite = ASYNC_DATABASE_URL.startswith("sqlite")

async_engine = create_async_engine(
ASYNC_DATABASE_URL,
echo=False,
future=True,
pool_size=10,
max_overflow=20,
# Pool arguments only for real DBs (Postgres)
**({
"pool_size": 10,
"max_overflow": 20,
} if not is_sqlite else {})
)

AsyncSessionLocal = async_sessionmaker(
Expand Down
112 changes: 112 additions & 0 deletions backend/async_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@
Integer,
String,
Text,
Float,
Boolean,
JSON,
UniqueConstraint,
)
from sqlalchemy.dialects.postgresql import JSONB, UUID as PG_UUID
Expand Down Expand Up @@ -210,3 +213,112 @@ class AuditLog(AsyncBase):

def __repr__(self) -> str:
return f"<AuditLog {self.action} on {self.entity_type}/{self.entity_id}>"


class AnalysisSession(AsyncBase):
"""Stores the state of a document extraction and risk analysis session."""
__tablename__ = "analysis_sessions"

id: Mapped[str] = mapped_column(String(128), primary_key=True)
tenant_id: Mapped[str] = mapped_column(String(128), index=True)
status: Mapped[str] = mapped_column(String(64), default="INITIATED")
raw_extracts: Mapped[dict] = mapped_column(JSON, default=dict)
features: Mapped[dict] = mapped_column(JSON, default=dict)
results: Mapped[dict] = mapped_column(JSON, default=dict)
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow, onupdate=_utcnow)


class WorkflowDefinition(AsyncBase):
__tablename__ = "workflow_definitions"

id: Mapped[str] = mapped_column(String(128), primary_key=True)
name: Mapped[str] = mapped_column(String(255), default="Untitled Workflow")
status: Mapped[str] = mapped_column(String(32), default="draft")
definition_json: Mapped[dict] = mapped_column(JSON, default=dict)
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow, onupdate=_utcnow)

nodes: Mapped[list["WorkflowNodeDefinition"]] = relationship(back_populates="workflow", cascade="all, delete-orphan")
edges: Mapped[list["WorkflowEdgeDefinition"]] = relationship(back_populates="workflow", cascade="all, delete-orphan")


class WorkflowNodeDefinition(AsyncBase):
__tablename__ = "workflow_node_definitions"

id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
workflow_id: Mapped[str] = mapped_column(String(128), ForeignKey("workflow_definitions.id", ondelete="CASCADE"), index=True)
node_id: Mapped[str] = mapped_column(String(128))
node_type: Mapped[str] = mapped_column(String(64))
label: Mapped[str | None] = mapped_column(String(255))
position_x: Mapped[float] = mapped_column(Float, default=0)
position_y: Mapped[float] = mapped_column(Float, default=0)
config_json: Mapped[dict] = mapped_column(JSON, default=dict)
execution_config_json: Mapped[dict] = mapped_column(JSON, default=dict)

workflow: Mapped["WorkflowDefinition"] = relationship(back_populates="nodes")


class WorkflowEdgeDefinition(AsyncBase):
__tablename__ = "workflow_edge_definitions"

id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
workflow_id: Mapped[str] = mapped_column(String(128), ForeignKey("workflow_definitions.id", ondelete="CASCADE"), index=True)
edge_id: Mapped[str] = mapped_column(String(128))
source_node_id: Mapped[str] = mapped_column(String(128))
target_node_id: Mapped[str] = mapped_column(String(128))
source_handle: Mapped[str | None] = mapped_column(String(64))
target_handle: Mapped[str | None] = mapped_column(String(64))
edge_type: Mapped[str | None] = mapped_column(String(64))
config_json: Mapped[dict] = mapped_column(JSON, default=dict)

workflow: Mapped["WorkflowDefinition"] = relationship(back_populates="edges")


class ExecutionRun(AsyncBase):
__tablename__ = "execution_runs"

id: Mapped[str] = mapped_column(String(128), primary_key=True)
workflow_id: Mapped[str | None] = mapped_column(String(128), ForeignKey("workflow_definitions.id", ondelete="SET NULL"), nullable=True)
status: Mapped[str] = mapped_column(String(32), default="queued")
initial_payload_json: Mapped[dict] = mapped_column(JSON, default=dict)
final_payload_json: Mapped[dict | None] = mapped_column(JSON, nullable=True)
error_message: Mapped[str | None] = mapped_column(Text)
tokens_consumed: Mapped[int] = mapped_column(Integer, default=0)
started_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
finished_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
duration_ms: Mapped[int | None] = mapped_column(Integer)
updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow, onupdate=_utcnow)

workflow: Mapped["WorkflowDefinition | None"] = relationship()


class NodeExecutionLog(AsyncBase):
__tablename__ = "node_execution_logs"

id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
execution_id: Mapped[str] = mapped_column(String(128), ForeignKey("execution_runs.id", ondelete="CASCADE"), index=True)
workflow_id: Mapped[str | None] = mapped_column(String(128))
node_id: Mapped[str] = mapped_column(String(128))
node_type: Mapped[str] = mapped_column(String(64))
event_type: Mapped[str] = mapped_column(String(64))
status: Mapped[str] = mapped_column(String(32))
attempt: Mapped[int] = mapped_column(Integer, default=1)
input_payload_json: Mapped[dict | None] = mapped_column(JSON)
output_payload_json: Mapped[dict | None] = mapped_column(JSON)
source_edges_json: Mapped[list | None] = mapped_column(JSON)
error_message: Mapped[str | None] = mapped_column(Text)
started_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
finished_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
duration_ms: Mapped[int | None] = mapped_column(Integer)


class DeadLetterExecution(AsyncBase):
__tablename__ = "dead_letter_executions"

id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
execution_id: Mapped[str] = mapped_column(String(128), ForeignKey("execution_runs.id", ondelete="CASCADE"), unique=True)
workflow_id: Mapped[str | None] = mapped_column(String(128))
failure_stage: Mapped[str] = mapped_column(String(64), default="workflow")
reason: Mapped[str] = mapped_column(Text)
payload_json: Mapped[dict | None] = mapped_column(JSON)
Loading