LampStack.Demo.2.mp4
Healthcare provider data accuracy remains a critical challenge in the medical industry, with significant financial and operational impacts:
Financial Impact
- $2 billion lost annually due to incorrect or outdated provider data
- 10-15% error rates in provider information across healthcare systems
- Average 2-3 days manual validation time per provider record
Operational Challenges
- Manual cross-referencing required across multiple databases (NPI Registry, State Medical Boards, Google Places API)
- Inconsistent data formats between different authoritative sources
- No standardized trust scoring or quality metrics
- Difficulty in detecting duplicate or conflicting provider records
- Poor scalability when validating thousands of providers
- Lack of automated conflict resolution mechanisms
Compliance Risks
- Regulatory compliance issues from outdated credentials
- Provider credentialing delays impacting operations
- Incomplete audit trails for data validation processes
LampStack is an autonomous platform that validates healthcare provider data through a multi-agent AI architecture orchestrated by LangGraph. The system processes both structured (CSV) and unstructured (PDF, images) data formats to ensure comprehensive validation across multiple authoritative sources.
Core Capabilities
The platform implements a four-layer validation pipeline:
- Data Ingestion - Automated scraping from NPI Registry, State Medical Board databases, and Google Places API
- Cross-Validation - Systematic verification of provider credentials across all collected sources
- Data Enrichment - Intelligent gap-filling for missing information and data normalization
- Trust Scoring - Weighted calculation generating A-F grades with actionable recommendations
Key Differentiators
- Processing speed: 30-45 seconds per provider (vs 2-3 days manual validation)
- Multi-source verification: Simultaneous validation across 3+ authoritative databases
- Real-time monitoring: WebSocket-based progress tracking and notifications
- Human-in-the-loop: Feedback integration for continuous model improvement
- Vector similarity search: Semantic matching for duplicate detection using Milvus
The system employs specialized AI agents, each responsible for a distinct validation stage:
Ingestion Agent
- Scrapes National Provider Identifier (NPI) Registry for federal provider data
- Queries State Medical Board APIs for license verification
- Retrieves contact information from Google Places API
- Processes unstructured documents using Mistral AI OCR (Pixtral-12B model)
Validation Agent
- Cross-references provider names across all data sources
- Verifies license numbers and expiration dates
- Validates contact information (phone, email, address)
- Flags discrepancies and conflicts with severity ratings
Enrichment Agent
- Fills missing fields using most reliable source data
- Calculates data completeness percentage
- Normalizes inconsistent data formats (addresses, phone numbers)
- Generates vector embeddings for semantic search
Scoring Agent
- Applies weighted trust score algorithm
- Assigns letter grades (A-F) based on validation results
- Generates specific recommendations for data improvement
- Updates PostgreSQL database with validation results
LangGraph manages the sequential execution of agents through a state machine:
workflow = StateGraph(ValidationState)
workflow.add_node("ingestion", ingestion_agent)
workflow.add_node("validation", validation_agent)
workflow.add_node("enrichment", enrichment_agent)
workflow.add_node("scoring", scoring_agent)
workflow.add_edge("ingestion", "validation")
workflow.add_edge("validation", "enrichment")
workflow.add_edge("enrichment", "scoring")Trust Score = (NPI_Match_Score × 0.4) + (License_Validity_Score × 0.4) + (Data_Completeness × 0.2)
Where:
- NPI_Match_Score: 100 if verified in NPI Registry, 0 otherwise
- License_Validity_Score: 100 if active, 50 if inactive, 0 if invalid/expired
- Data_Completeness: (Number_of_Filled_Fields / Total_Required_Fields) × 100
Grade Assignment:
A: 90-100 (Excellent - all sources verified)
B: 80-89 (Good - minor discrepancies)
C: 70-79 (Acceptable - some missing data)
D: 60-69 (Poor - significant gaps)
F: <60 (Failed - major conflicts or missing critical data)
- User uploads CSV or PDF containing provider data via React frontend
- Java backend parses the file and stores records in PostgreSQL
- Backend creates validation job and triggers Python agent service via HTTP
- Python agents execute in sequence (LangGraph orchestration):
- Ingestion Agent scrapes external APIs
- Validation Agent cross-checks data
- Enrichment Agent fills gaps
- Scoring Agent calculates trust score
- Python service stores vector embeddings in Milvus for semantic search
- Python service sends progress updates to Java backend via HTTP callbacks
- Java backend broadcasts real-time updates to frontend via WebSocket
- Frontend displays validation results and trust scores
- Java Development Kit (JDK) 17 or higher
- Python 3.11 or higher
- Node.js 18 or higher
- Docker and Docker Compose
- Maven 3.8+ (or use included Maven wrapper)
- Git
git clone https://github.com/CroWzblooD/LampStack.git
cd LampStackdocker-compose up -dThis starts PostgreSQL, Milvus, Etcd, and MinIO containers. Verify services:
docker-compose psExpected services:
postgreson port 5432milvus-standaloneon port 19530etcdon port 2379minioon ports 9000/9001
cd server
# Build the project
./mvnw clean install
# Run the application
./mvnw spring-boot:runThe backend will start on http://localhost:8080
cd agent-service
# Create virtual environment
python -m venv venv
# Activate virtual environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run the service
uvicorn app.main:app --host 0.0.0.0 --port 8001 --reloadThe agent service will start on http://localhost:8001
cd client
# Install dependencies
npm install
# Start development server
npm run devThe frontend will start on http://localhost:5173
Create server/.env:
# Database
SPRING_DATASOURCE_URL=jdbc:postgresql://localhost:5432/healthcare_validation
SPRING_DATASOURCE_USERNAME=postgres
SPRING_DATASOURCE_PASSWORD=yourpassword
# Security
JWT_SECRET=your-256-bit-secret-key-here
JWT_EXPIRATION=86400000
# Agent Service
PYTHON_AGENT_SERVICE_URL=http://localhost:8001
# Server
SERVER_PORT=8080Alternatively, edit server/src/main/resources/application.yml:
spring:
datasource:
url: jdbc:postgresql://localhost:5432/healthcare_validation
username: postgres
password: yourpassword
jpa:
hibernate:
ddl-auto: update
show-sql: false
app:
jwt:
secret: your-secret-key
expiration: 86400000
python:
agent-service:
url: http://localhost:8001Create agent-service/.env:
# Java Backend
JAVA_SERVICE_URL=http://localhost:8080
# AI Services
MISTRAL_API_KEY=your-mistral-api-key-here
# Vector Database
MILVUS_HOST=localhost
MILVUS_PORT=19530
# PostgreSQL
DATABASE_URL=postgresql://postgres:yourpassword@localhost:5432/healthcare_validation
# External APIs
NPI_REGISTRY_API_URL=https://npiregistry.cms.hhs.gov/api
GOOGLE_PLACES_API_KEY=your-google-places-api-key-hereCreate client/.env.local:
VITE_API_URL=http://localhost:8080/api
VITE_WS_URL=ws://localhost:8080/ws