Health Query Chatbot (Adaptive Multi-LLM)

Production-grade AI health query service with adaptive routing, NVIDIA LLM ensemble reasoning, and Gemini final aggregation.

Architecture

Classifier (fast NVIDIA model) labels query type, complexity, risk, and routing needs.
Router decides between fast response, ensemble reasoning, or emergency bypass.
Ensemble runs multiple NVIDIA models in parallel and validates strict JSON outputs.
Gemini Judge aggregates only overlapping, safe conclusions and adds a medical disclaimer.
Safety enforces strict medical constraints and emergency escalation.

Project Structure

health_ai/
+-- app.py
+-- router/
|   +-- classifier.py
|   +-- decision.py
+-- models/
|   +-- nvidia_client.py
|   +-- nemotron.py
|   +-- mistral.py
|   +-- qwen.py
+-- ensemble/
|   +-- runner.py
|   +-- validator.py
|   +-- normalizer.py
+-- judge/
|   +-- gemini.py
+-- safety/
|   +-- rules.py
+-- schemas/
|   +-- ensemble_response.json
+-- utils/
|   +-- env.py
|   +-- logger.py
+-- .env.example
+-- requirements.txt
+-- README.md

Setup

Create and activate a virtual environment.
Install dependencies:

pip install -r health_ai/requirements.txt

Create health_ai/.env from health_ai/.env.example and set your keys. You can set a single NVIDIA_API_KEY for all models or override per model with NVIDIA_API_KEY_*.

Run

uvicorn health_ai.app:app --host 0.0.0.0 --port 8000 --reload

API

GET /health
POST /query

Example:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query":"I have a headache and mild nausea for two days"}'

Safety Guarantees

No diagnoses, prescriptions, or medication names.
Emergency queries bypass models and return immediate escalation.
All responses include a medical disclaimer.

Environment Variables

NVIDIA_API_KEY=...
NVIDIA_API_KEY_CLASSIFIER=...
NVIDIA_API_KEY_FAST=...
NVIDIA_API_KEY_NEMOTRON=...
NVIDIA_API_KEY_MISTRAL=...
NVIDIA_API_KEY_QWEN=...
NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1
NVIDIA_TIMEOUT_S=20
GEMINI_API_KEY=...
GEMINI_MODEL=gemini-1.5-pro
LOG_LEVEL=INFO

Notes

Ensemble outputs are strictly validated against schemas/ensemble_response.json.
If any model flags high severity, the system escalates immediately.
Gemini aggregation only uses overlapping, safe content.

Security

No API keys are stored in code.
Use environment variables or a secrets manager in production.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Health Query Chatbot (Adaptive Multi-LLM)

Architecture

Project Structure

Setup

Run

API

Safety Guarantees

Environment Variables

Notes

Security

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
ensemble		ensemble
judge		judge
models		models
router		router
safety		safety
schemas		schemas
static		static
utils		utils
.env		.env
.env.example		.env.example
README.md		README.md
__init__.py		__init__.py
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Health Query Chatbot (Adaptive Multi-LLM)

Architecture

Project Structure

Setup

Run

API

Safety Guarantees

Environment Variables

Notes

Security

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages