Open-source platform for testing healthcare AI agents with adversarial multi-turn conversations and automated grading.
Preclinical simulates realistic adversarial patient interactions against your healthcare AI agent, captures transcripts, and grades outcomes against safety rubrics. Self-hosted with Docker Compose.
- Docker Desktop (or Docker Engine + Docker Compose)
- An
OPENAI_API_KEYorANTHROPIC_API_KEY(see.env.example) - A
BROWSER_USE_API_KEYfor browser-based testing with Browser Use Cloud
git clone https://github.com/Mentat-Lab/preclinical.git
cd preclinical
make setup # copies .env.example + starts services
# Edit .env and set OPENAI_API_KEY=sk-... and BROWSER_USE_API_KEY=...Open http://localhost:3000 to access the UI.
make up # start services
make down # stop everything
make restart # down + up (picks up .env changes)
make logs # tail logs
make status # check health
make clean # remove volumes, restart fresh
make nuke # destroy everything + rebuild from scratchDefault (OpenAI) -- requires OPENAI_API_KEY in .env.
Browser testing (chatgpt.com, claude.ai, etc.) uses Browser Use Cloud. Set BROWSER_USE_API_KEY in .env and reuse Browser Use profiles for repeated runs on the same domain.
pip install preclinical
preclinical run <agent-id> --creative --watch/plugin marketplace add Mentat-Lab/preclinical
/plugin install preclinical@preclinical
Provides 8 slash commands: /preclinical:setup, /preclinical:run, /preclinical:benchmark, /preclinical:diagnose, and more. Includes a SessionStart health check and cold-start setup wizard. If you clone the repo, the plugin loads automatically.
npx skills add Mentat-Lab/preclinicalSame capabilities as the plugin, for non-Claude Code AI assistants.
openai (HTTP) | vapi (REST) | livekit (WebRTC) | pipecat (Daily/LiveKit) | elevenlabs (Voice) | browser (Browser Use Cloud)
Requires a running PostgreSQL and valid DATABASE_URL.
cd server && npm install && npm run dev # API server (port 8000)
cd frontend && npm install && npm run dev # UI (port 3000, proxies to :8000)
cd tests && npm install && npm test # Testspreclinical/
├── server/ # Hono API, LangGraph workers, provider integrations
│ ├── src/routes/ # Domain-split route modules (agent, scenario, run)
│ ├── src/graphs/ # LangGraph StateGraphs (tester, grader)
│ ├── src/providers/ # Provider implementations (openai, vapi, livekit, pipecat, elevenlabs, browser)
│ └── src/workers/ # Scenario runner + voice transports
├── frontend/ # Vite + React UI
├── cli/ # Python CLI and SDK (PyPI: preclinical)
├── plugins/preclinical/ # Claude Code plugin (slash commands, hooks, skills)
├── skills/ # Agent skills for AI coding assistants (skills.sh)
├── tests/ # API and E2E tests
├── target-agents/ # Local provider mock/target agents
└── docs-site/ # Documentation (MkDocs Material)
See .env.example for all environment variables. Key settings:
OPENAI_API_KEY-- OpenAI (or compatible) API keyANTHROPIC_API_KEY-- for Claude modelsTESTER_MODEL/GRADER_MODEL-- LLM models for patient simulation and grading (default:gpt-4o-mini)BROWSER_USE_API_KEY-- Browser Use Cloud API key for browser-based testing
Full documentation: Architecture, CI/CD Integration, Integrations
git pull && make restartSee CONTRIBUTING.md.
Apache-2.0 -- see LICENSE.
