Skip to content

Latest commit

 

History

History
126 lines (92 loc) · 4.69 KB

File metadata and controls

126 lines (92 loc) · 4.69 KB

Preclinical

Open-source platform for testing healthcare AI agents with adversarial multi-turn conversations and automated grading.

License: Apache-2.0 CI

Preclinical simulates realistic adversarial patient interactions against your healthcare AI agent, captures transcripts, and grades outcomes against safety rubrics. Self-hosted with Docker Compose.

How Preclinical Works

Quick Start

Prerequisites

  • Docker Desktop (or Docker Engine + Docker Compose)
  • An OPENAI_API_KEY or ANTHROPIC_API_KEY (see .env.example)
  • A BROWSER_USE_API_KEY for browser-based testing with Browser Use Cloud

Setup

git clone https://github.com/Mentat-Lab/preclinical.git
cd preclinical
make setup          # copies .env.example + starts services
# Edit .env and set OPENAI_API_KEY=sk-... and BROWSER_USE_API_KEY=...

Open http://localhost:3000 to access the UI.

Daily Workflow

make up             # start services
make down           # stop everything
make restart        # down + up (picks up .env changes)
make logs           # tail logs
make status         # check health
make clean          # remove volumes, restart fresh
make nuke           # destroy everything + rebuild from scratch

Runtime Modes

Default (OpenAI) -- requires OPENAI_API_KEY in .env.

Browser testing (chatgpt.com, claude.ai, etc.) uses Browser Use Cloud. Set BROWSER_USE_API_KEY in .env and reuse Browser Use profiles for repeated runs on the same domain.

CLI & SDK

Python CLI

pip install preclinical
preclinical run <agent-id> --creative --watch

Claude Code Plugin

/plugin marketplace add Mentat-Lab/preclinical
/plugin install preclinical@preclinical

Provides 8 slash commands: /preclinical:setup, /preclinical:run, /preclinical:benchmark, /preclinical:diagnose, and more. Includes a SessionStart health check and cold-start setup wizard. If you clone the repo, the plugin loads automatically.

Agent Skills (Cursor, Windsurf, Copilot, Cline, and more)

npx skills add Mentat-Lab/preclinical

Same capabilities as the plugin, for non-Claude Code AI assistants.

Supported Providers

openai (HTTP) | vapi (REST) | livekit (WebRTC) | pipecat (Daily/LiveKit) | elevenlabs (Voice) | browser (Browser Use Cloud)

Local Development (Without Docker)

Requires a running PostgreSQL and valid DATABASE_URL.

cd server && npm install && npm run dev      # API server (port 8000)
cd frontend && npm install && npm run dev    # UI (port 3000, proxies to :8000)
cd tests && npm install && npm test          # Tests

Project Structure

preclinical/
├── server/               # Hono API, LangGraph workers, provider integrations
│   ├── src/routes/       #   Domain-split route modules (agent, scenario, run)
│   ├── src/graphs/       #   LangGraph StateGraphs (tester, grader)
│   ├── src/providers/    #   Provider implementations (openai, vapi, livekit, pipecat, elevenlabs, browser)
│   └── src/workers/      #   Scenario runner + voice transports
├── frontend/             # Vite + React UI
├── cli/                  # Python CLI and SDK (PyPI: preclinical)
├── plugins/preclinical/  # Claude Code plugin (slash commands, hooks, skills)
├── skills/               # Agent skills for AI coding assistants (skills.sh)
├── tests/                # API and E2E tests
├── target-agents/        # Local provider mock/target agents
└── docs-site/            # Documentation (MkDocs Material)

Configuration

See .env.example for all environment variables. Key settings:

  • OPENAI_API_KEY -- OpenAI (or compatible) API key
  • ANTHROPIC_API_KEY -- for Claude models
  • TESTER_MODEL / GRADER_MODEL -- LLM models for patient simulation and grading (default: gpt-4o-mini)
  • BROWSER_USE_API_KEY -- Browser Use Cloud API key for browser-based testing

Documentation

Full documentation: Architecture, CI/CD Integration, Integrations

Updating

git pull && make restart

Contributing

See CONTRIBUTING.md.

License

Apache-2.0 -- see LICENSE.