An intelligent radiological assistance system that combines YOLOv11-L object detection with comprehensive MLOps infrastructure to help radiologists identify and localize chest X-ray abnormalities with enhanced accuracy and efficiency.
This system provides AI-assisted chest X-ray analysis for radiology departments, offering both pathology classification and precise localization through bounding boxes. Built with the VinDr-CXR dataset and validated by multiple radiologists, the system enhances diagnostic workflow while maintaining radiologists as the ultimate decision-makers.
- Reduced Missed Pathologies: Identifies and highlights potential abnormalities that might be overlooked during routine reads
- Improved Efficiency: Pre-highlights regions of concern, allowing radiologists to focus attention on suspicious areas
- Second Opinion Support: Provides automated consultation that can confirm findings or prompt reconsideration
- Scalable Infrastructure: Cloud-native architecture with automated CI/CD pipelines
The system follows a microservices architecture with the following components:
βββββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Data Pipeline β β Model Training β β Model Serving β
β β β β β β
β β’ ETL Processing β β β’ YOLOv11-L β β β’ Triton Server β
β β’ Data Validation β β β’ Distributed β β β’ ONNX Runtime β
β β’ Quality Checks β β Training β β β’ TensorRT β
βββββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Monitoring β β Orchestration β β Storage β
β β β β β β
β β’ Prometheus β β β’ Kubernetes β β β’ MinIO β
β β’ Grafana β β β’ ArgoCD β β β’ PostgreSQL β
β β’ Alert Manager β β β’ Argo Workflowsβ β β’ Swift Storage β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
- Multi-class Detection: Identifies 14+ chest abnormalities including pneumothorax, consolidation, and pleural effusion
- Precise Localization: Provides bounding boxes for abnormality regions
- Confidence Scoring: Outputs confidence levels for each detection
- High Performance: Optimized for both accuracy and inference speed
- Distributed Training: Multi-GPU training with Ray for faster model development
- Model Versioning: MLflow integration for experiment tracking and model registry
- Automated Deployment: Deployment with staging, canary, and production environments
- Real-time Monitoring: Comprehensive monitoring of model performance and system health
- Data Pipeline: Automated ETL processes with data quality validation
- Multiple Serving Options: CPU, GPU, and edge deployment configurations
- Model Optimization: ONNX conversion, quantization, and TensorRT acceleration
- Dynamic Batching: Optimized throughput for varying loads
- Fault Tolerance: Automatic recovery and failover mechanisms
- Advanced Performance Tuning: Comprehensive Triton server configurations for maximum efficiency
| Metric | Target | Achieved |
|---|---|---|
| Inference Latency (Single) | <200ms | ~150ms |
| Batch Throughput | 30 FPS | 45+ FPS |
| Model Accuracy ([email protected]) | >0.75 | 0.82 |
| Concurrent Requests | 10+ | 25+ |
| Uptime | 99.9% | 99.95% |
Performance experiments have achieved following results across multiple deployment scenarios:
- Custom TensorRT V2: 157.8 infer/sec throughput with optimized memory allocation
- Custom TensorRT V1: Ultra-low latency (19.3ms) with dynamic batching
- Default TensorRT: 129.0 infer/sec for balanced performance
- ONNX Runtime: 78-79 infer/sec with CUDA acceleration
- OpenVINO Nano: 14.6 infer/sec - 15x faster than standard CPU backends
- Edge Deployment: Optimized for resource-constrained environments
- Memory Efficiency: Reduced model size with compact output format
π Detailed performance analysis, configuration files, and optimization insights are available in the Triton Configuration Repository
- Model: YOLOv11-L (Ultralytics)
- Training: PyTorch with distributed training via Ray
- Serving: NVIDIA Triton Inference Server
- Optimization: ONNX Runtime, TensorRT, OpenVINO
- Orchestration: Kubernetes, ArgoCD, Argo Workflows
- Storage: MinIO (object), PostgreSQL (metadata), Swift (datasets)
- Monitoring: Prometheus, Grafana, Alert Manager
- MLOps: MLflow, Label Studio
- IaC: Terraform, Ansible
- CI/CD: GitOps with ArgoCD
- Containerization: Docker, Kubernetes
- Cloud: Multi-cloud compatible (tested on Chameleon Cloud)
- Kubernetes cluster (1.24+)
- NVIDIA GPUs (optional, for GPU acceleration)
- 250GB+ storage for datasets
- Docker and Terraform installed
- Clone the Repository
git clone https://github.com/your-org/chest-xray-detection.git
cd chest-xray-detection- Set Up Infrastructure
# Install Terraform
mkdir -p ~/.local/bin
wget https://releases.hashicorp.com/terraform/1.10.5/terraform_1.10.5_linux_amd64.zip
unzip terraform_1.10.5_linux_amd64.zip
mv terraform ~/.local/bin
# Create cloud resources
cd iac/tf/kvm
export TF_VAR_suffix=your-project
export TF_VAR_key=your-ssh-key
terraform init
terraform apply -auto-approve- Configure Kubernetes
# Set up Kubernetes cluster
cd ./iac/ansible
ansible-playbook -i inventory.yml pre_k8s/pre_k8s_configure.yml
cd k8s/kubespray
ansible-playbook -i ../inventory/mycluster --become --become-user=root ./cluster.yml
cd ../..
ansible-playbook -i inventory.yml post_k8s/post_k8s_configure.yml- Deploy ML Platform
# Deploy core services
ansible-playbook -i inventory.yml argocd/argocd_add_platform.yml
# Set up environments
ansible-playbook -i inventory.yml argocd/workflow_build_init.yml
ansible-playbook -i inventory.yml argocd/argocd_add_staging.yml
ansible-playbook -i inventory.yml argocd/argocd_add_canary.yml
ansible-playbook -i inventory.yml argocd/argocd_add_prod.yml
ansible-playbook -i inventory.yml argocd/workflow_templates_apply.yml- Offline ETL Pipeline
cd deployment/docker
docker-compose -f docker-compose-etl.yaml up- Online Data Simulation
cd ./data-pipeline/data-simulation
docker-compose -f docker-compose-data-simulation.yaml up -d- Production Pipeline
cd ./deployment/docker
docker-compose -f docker-compose-production.yaml up# Start distributed training
python model_train/train_yolo.py --distributed --gpus 4The system automatically deploys trained models through the GitOps pipeline. Access the API endpoints:
- Production:
http://your-ip/predict - Staging:
http://your-ip:8081/predict - Canary:
http://your-ip:8080/predict
# Single image prediction
curl -X POST "http://your-ip/predict" \
-H "Content-Type: multipart/form-data" \
-F "file=@chest_xray.jpg"
# Response
{
"predictions": [
{
"class": "pneumothorax",
"confidence": 0.89,
"bbox": [0.23, 0.15, 0.45, 0.32]
}
],
"inference_time": 0.145
}import requests
import json
# Batch prediction
files = [('files', open(f'image_{i}.jpg', 'rb')) for i in range(5)]
response = requests.post('http://your-ip/predict/batch', files=files)
results = response.json()- Grafana:
http://your-ip:3000(admin/admin) - MLflow:
http://your-ip:8000 - MinIO:
http://your-ip:9001 - Label Studio:
http://your-ip:5000 - Data Dashboard:
http://your-ip:8501
- Model performance (accuracy, precision, recall)
- Inference latency and throughput
- System resource utilization
- Data quality and drift detection
- Error rates and alert conditions
- Real-time Model Performance Tracking: Continuous accuracy monitoring with automated alerts
- Data Drift Detection: Advanced statistical analysis to detect distribution shifts in incoming data
- Concept Drift Monitoring: Tracks changes in the relationship between features and predictions
- Model Decay Detection: Identifies when model performance degrades over time
- A/B Testing Framework: Compare model versions in production environments
Our monitoring system supports multiple deployment scenarios with specialized branches:
- Production Branch: Full-scale monitoring with comprehensive alerting
- Staging Branch: Pre-production validation with synthetic data testing
- Canary Branch: Gradual rollout monitoring with performance comparison
- Data Drift Branch: Specialized monitoring for data quality and distribution analysis
π¬ Additonal serving and monitoring configuration available in the Serving Monitoring Repository
The system implements automated pipelines:
- Code Changes β Git repository
- ArgoCD Detection β Automatic sync
- Staging Deployment β Automated testing
- Canary Release β Gradual traffic shift
- Production Rollout β Full deployment
- Training: Distributed training with hyperparameter optimization
- Validation: Automated testing on held-out datasets
- Staging: Deployment to staging environment for integration testing
- Canary: Limited production traffic for performance validation
- Production: Full deployment with monitoring
- Retraining: Automatic retraining based on performance degradation
# Model serving configuration
MODEL_PATH=/models/yolov11l.onnx
BATCH_SIZE=8
MAX_WORKERS=4
DEVICE=cuda # or cpu
# Monitoring configuration
PROMETHEUS_PORT=9090
GRAFANA_PORT=3000
ALERT_WEBHOOK_URL=your-webhook-url
# Storage configuration
MINIO_ACCESS_KEY=your-access-key
MINIO_SECRET_KEY=your-secret-key
SWIFT_CONTAINER=your-container# model_config.yaml
model:
name: yolov11l-chest-xray
version: "1.0.0"
input_size: [1024, 1024]
classes: 14
confidence_threshold: 0.25
iou_threshold: 0.45
serving:
max_batch_size: 16
preferred_batch_size: 8
max_queue_delay_microseconds: 100pytest tests/unit/ -vpytest tests/integration/ -v# Using locust
locust -f tests/load/test_api.py --host=http://your-ippython tests/model/validate_model.py --model-path /path/to/model- ONNX Conversion: Reduces model size and improves portability
- Quantization: INT8 quantization for CPU deployment
- TensorRT: GPU acceleration for NVIDIA hardware
- Pruning: Removes unnecessary model parameters
- Dynamic Batching: Automatically batches requests for improved throughput
- Multi-instance Serving: Parallel model instances for concurrent processing
- Caching: Redis-based result caching for repeat queries
- Load Balancing: Distributes requests across multiple instances
- Data Privacy: PHI data handling compliance (HIPAA considerations)
- Access Control: Role-based access control (RBAC) for all services
- Audit Logging: Comprehensive logging for all system interactions
- Secure Communication: TLS encryption for all API endpoints
- Vulnerability Scanning: Regular security scans of container images
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Set up development environment
python -m venv venv
source venv/bin/activate
pip install -r requirements-dev.txt
pre-commit installThis project is licensed under the MIT License - see the LICENSE file for details.
- VinBigData for the chest X-ray dataset
- Ultralytics for the YOLOv11 implementation
- NVIDIA for Triton Inference Server
- Chameleon Cloud for infrastructure support
- YOLO-Triton-Configs Repository: Comprehensive Triton server configurations, performance benchmarks, and optimization insights for maximum inference efficiency across GPU and CPU deployments.
- Serving Monitoring Repository: Production-ready monitoring solutions with advanced data drift detection, model performance tracking, and multi-branch deployment monitoring capabilities.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Project Wiki
- Edge deployment optimization for mobile devices
- Integration with PACS systems
- Multi-language support for international deployment
- Advanced explainability features (GradCAM, SHAP)
- Federated learning capabilities
- Integration with electronic health records (EHR)