An AI system that reasons about molecular design — mutating, filtering, and docking drug candidates autonomously.
AutoRx is a fully autonomous drug-design agent that bridges complex AI orchestration with rigid bioinformatics. Built using LLM-Assisted Engineering, a large-language model (GPT-4o, Qwen, or any OpenAI-compatible endpoint) acts as the reasoning engine to orchestrate a rigorous 11-step pipeline of deterministic cheminformatics tools.
Given a protein target (PDB ID) and a seed molecule (SMILES), AutoRx will autonomously:
- Fetch and clean the 3-D protein structure from RCSB PDB.
- Identify the active site from a co-crystallised ligand.
- Mutate the seed molecule using SELFIES to generate novel chemical space.
- Triage candidates through 2-D drug-likeness filters.
- Evaluate Oral Bioavailability (Lipinski/Veber/ESOL).
- Predict Gut Permeability (Caco-2 / Egan Egg models).
- Assess Metabolic Stability (CYP450 inhibition risk).
- Screen for toxic structural alerts (PAINS, BRENK).
- Run deep-learning toxicity predictions via a Tox21 neural network.
- Dock the survivors using AutoDock Vina (physics-based).
- Rescore binding affinities using GNINA (CNN-based).
If candidates fail strict pharmacokinetic safety checks, the agent self-corrects, looping back to re-mutate the seed and retry the pipeline.
flowchart TD
A["👤 User Prompt"] --> B["🧠 LLM Reasoning Engine"]
B --> C["1 · Fetch & Prep PDB"]
C --> D["2 · Find Binding Site"]
D --> E["3 · Mutate Seed (SELFIES)"]
E --> F["4 · 2-D Drug-Likeness Filter"]
F --> G{"All Failed?"}
G -- Yes --> E
G -- No --> FA["5 · ADME Profiler (Oral)"]
FA --> FB["6 · Caco-2 Predictor (Gut)"]
FB --> FC["7 · Metabolic Profiler (CYP450)"]
FC --> H["8 · Toxicology Scanner (PAINS)"]
H --> I["9 · Deep Tox Oracle (Tox21 NN)"]
I --> J{"Safe Candidates?"}
J -- No --> E
J -- Yes --> K["10 · AutoDock Vina (Docking)"]
K --> L["11 · GNINA (CNN Rescoring)"]
L --> M["📊 Ranked Results Table"]
M --> B
style A fill:#1a1a2e,stroke:#e94560,color:#fff
style B fill:#16213e,stroke:#0f3460,color:#fff
style M fill:#1a1a2e,stroke:#e94560,color:#fff
| Step | Tool | Purpose |
|---|---|---|
| 1 | fetch_and_prep_pdb.py |
Download PDB, strip waters & ligands → clean receptor |
| 2 | find_binding_site.py |
Locate co-crystallised ligand → extract grid coordinates |
| 3 | mutate_molecule.py |
SELFIES-based random mutations → novel candidate SMILES |
| 4 | filter_molecules.py |
Lipinski, QED, SA-score, optional CNS/BBB rules |
| 5 | adme_profiler.py |
Oral bioavailability: Lipinski Ro5, Veber's rules, ESOL LogS |
| 6 | caco2_predictor.py |
Gut permeability: Egan Egg + passive-diffusion heuristics |
| 7 | metabolic_profiler.py |
CYP450 (3A4/2D6) inhibition and DDI risk assessment |
| 8 | toxicology_scanner.py |
PAINS, BRENK, NIH structural-alert detection |
| 9 | deep_tox_oracle.py |
Multi-endpoint Tox21 neural network predictions |
| 10 | run_autodock_vina.py |
Physics-based 3-D molecular docking |
| 11 | gnina_docking.py |
CNN-scored docking (deep learning re-scoring) |
| Category | Technology | Role |
|---|---|---|
| LLM Orchestration | OpenAI API | Reasoning engine, tool dispatch, self-correction |
| Cheminformatics | RDKit & SELFIES | Descriptors, SMILES mutation, PAINS/CYP450 heuristics |
| Pharmacokinetics | Delaney ESOL, Egan Egg | Solubility and Caco-2 permeability estimation |
| Deep Learning (Tox) | PyTorch | Multi-endpoint Tox21 toxicity neural network |
| 3-D Docking | AutoDock Vina & GNINA | Physics-based affinity and CNN-based pose rescoring |
conda create -n autorx python=3.10 -y
conda activate autorx
# Install core dependencies
conda install -c conda-forge rdkit autodock-vina openbabel biopython -y
pip install -r requirements.txtgit clone https://github.com/stavrosmetalikis/AutoRx.git
cd AutoRx
export OPENAI_API_KEY="sk-..."python agent_orchestrator.py \
"Design a selective, orally bioavailable inhibitor for the SARS-CoV-2 helicase (PDB 7NN0). Use this SMILES as the seed: c1ccc(c2nc3ccccc3[nH]2)cc1."
Top candidate (2-vinylbenzimidazole) docked in the active site of the SARS-CoV-2 helicase (PDB: 7NN0).
The autonomous agent successfully mutated the seed, passed 20 variants through ADME and Caco-2 Gut Permeability models, evaluated CYP3A4 metabolic risks, and docked the survivors.
| Rank | SMILES | Vina Affinity | GNINA CNN Score | Risk Profile |
|---|---|---|---|---|
| 1 | C=CC1=NC2=CC=CC=C2[NH1]1 |
-5.319 kcal/mol | 0.7087 | ✅ High Permeability, Mod Tox |
| 2 | C1=CC=C(C2=NC3=CC=CC=C3[NH1]2)C1 |
-6.306 kcal/mol | 0.6253 | ✅ High Permeability, Mod Tox |
Agent Conclusion: The top candidate (C=CC1=NC2=CC=CC=C2[NH1]1) achieves excellent CNN-scores (0.7087) with highly favorable ADME properties (Low MW, High Caco-2 permeability, strong QED). Recommended for further structural optimization to mitigate mild CYP3A4 heme-binding flags caused by the unsubstituted imidazole scaffold.
- Blood-Brain Barrier (BBB) Penetration: Integrate BBB models for neurological targets.
- Molecular Dynamics (MD): Post-docking MD simulations with OpenMM to validate stability over time.
- Multi-Objective Optimisation: Pareto-front optimisation balancing affinity, selectivity, toxicity, and synthesisability.
- Retrosynthetic Planning: Integrate AiZynthFinder to verify proposed molecules are easily synthesisable.
- The Tox21 dataset used for training the Deep Tox Oracle is provided by the NIH/EPA Tox21 program.
- The Synthetic Accessibility (SA) Score module (
sascorer.py) is © Novartis Institutes for BioMedical Research.
This project is licensed under the GNU General Public License v3.0 (GPLv3) — see the LICENSE file for details. This ensures the project remains open-source and freely available for the scientific community.