This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Risk Network Analyzer v2 - A full-stack application for visualizing risk register data as interactive force-directed graphs with semantic clustering. Uses HDBSCAN for automatic cluster discovery and neural embeddings for semantic similarity detection.
docker-compose up --build # Production (http://localhost:3000)
docker-compose -f docker-compose.dev.yml up --build # Backend only with hot reloadcd backend
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000cd frontend
npm install
npm run dev # Dev server at http://localhost:5173
npm run build # TypeScript check + production build
npm run lint # ESLint
npm run type-check # TypeScript onlyFor local development with hot reload on both ends:
- Start backend services:
docker-compose -f docker-compose.dev.yml up - In another terminal:
cd frontend && npm run dev - Frontend at http://localhost:5173, API at http://localhost:8000/docs
The RiskAnalyzer in analyzer.py orchestrates a 4-step pipeline:
- NLPService (
nlp.py) - Generates 384-dim embeddings viaall-MiniLM-L6-v2, extracts cluster keywords via TF-IDF - ClusteringService (
clustering.py) - UMAP dimension reduction → HDBSCAN clustering (falls back to K-means if poor results) - LayoutService (
layout.py) - NetworkX spring layout with virtual cluster nodes as attraction points - Response assembly with similarity edges and cluster metadata
The graph uses a two-phase layout:
- Backend (
layout.py): NetworkX spring layout computes initial positions, uses virtual cluster centroid nodes to pull cluster members together - Frontend (
hooks/useForceSimulation.ts): d3-force runs client-side physics for 3 seconds to refine positions, then auto-stops
- Zustand store (
store/useStore.ts) - Holds nodes, edges, clusters, display/force/analysis settings - useForceSimulation hook - d3-force simulation with configurable repulsion, springs, damping
- GraphCanvas - deck.gl ScatterplotLayer (nodes) + LineLayer (edges) with WebGL rendering (handles 50K+ nodes)
POST /api/v1/upload-csv- Main endpoint: CSV file → analysis responsePOST /api/v1/analyze- JSON risks → analysis responsePOST /api/v1/similarity-matrix- Returns full n×n similarity matrix (useful for debugging similarity issues)GET /api/v1/health- Health checkGET /docs- FastAPI auto-generated OpenAPI documentation
Two edge types exist in the graph:
similarity- Connects semantically similar risks (cosine similarity > threshold)membership- Connects risks to their cluster centroid nodes
- Singleton NLPService - Model loaded once at startup via
@app.on_event("startup"), pre-downloaded during Docker build - HDBSCAN fallback - Falls back to K-means if HDBSCAN produces <2 clusters or >50% noise points
- CSV column aliases - Routes accept multiple column name variants (e.g., "risk id", "riskid", "risk_no" all map to
id). Seeroutes.pyfor the full alias mapping. - Force simulation auto-stop - Client-side simulation runs for 3s then stops to prevent CPU drain
- Cluster labels - Generated from top 3 TF-IDF keywords per cluster
- L2 normalization - Embeddings are L2-normalized, enabling cosine similarity via simple dot product
Settings in backend/app/config.py are loaded from environment variables (see .env.example):
EMBEDDING_MODEL- Sentence-transformer model (default:all-MiniLM-L6-v2)MIN_CLUSTER_SIZE- HDBSCAN minimum cluster size (default: 3)SIMILARITY_THRESHOLD- Edge creation threshold (default: 0.4)CORS_ORIGINS- Allowed origins for CORS (defaults include localhost:3000, localhost:5173)
No automated tests are currently configured. Manual testing workflow:
- Use the demo data button in the UI
- Upload CSV files to test the full pipeline
- Use
/api/v1/similarity-matrixto debug embedding/similarity issues - Check
/docsfor interactive API testing