🧬 AutoRx

Autonomous LLM Agent for De Novo Drug Design

An AI system that reasons about molecular design — mutating, filtering, and docking drug candidates autonomously.

📖 Overview

AutoRx is a fully autonomous drug-design agent that bridges complex AI orchestration with rigid bioinformatics. Built using LLM-Assisted Engineering, a large-language model (GPT-4o, Qwen, or any OpenAI-compatible endpoint) acts as the reasoning engine to orchestrate a rigorous 11-step pipeline of deterministic cheminformatics tools.

Given a protein target (PDB ID) and a seed molecule (SMILES), AutoRx will autonomously:

Fetch and clean the 3-D protein structure from RCSB PDB.
Identify the active site from a co-crystallised ligand.
Mutate the seed molecule using SELFIES to generate novel chemical space.
Triage candidates through 2-D drug-likeness filters.
Evaluate Oral Bioavailability (Lipinski/Veber/ESOL).
Predict Gut Permeability (Caco-2 / Egan Egg models).
Assess Metabolic Stability (CYP450 inhibition risk).
Screen for toxic structural alerts (PAINS, BRENK).
Run deep-learning toxicity predictions via a Tox21 neural network.
Dock the survivors using AutoDock Vina (physics-based).
Rescore binding affinities using GNINA (CNN-based).

If candidates fail strict pharmacokinetic safety checks, the agent self-corrects, looping back to re-mutate the seed and retry the pipeline.

🏗️ Architecture — The Agentic Loop

flowchart TD
    A["👤 User Prompt"] --> B["🧠 LLM Reasoning Engine"]
    B --> C["1 · Fetch & Prep PDB"]
    C --> D["2 · Find Binding Site"]
    D --> E["3 · Mutate Seed (SELFIES)"]
    E --> F["4 · 2-D Drug-Likeness Filter"]
    F --> G{"All Failed?"}
    G -- Yes --> E
    G -- No --> FA["5 · ADME Profiler (Oral)"]
    FA --> FB["6 · Caco-2 Predictor (Gut)"]
    FB --> FC["7 · Metabolic Profiler (CYP450)"]
    FC --> H["8 · Toxicology Scanner (PAINS)"]
    H --> I["9 · Deep Tox Oracle (Tox21 NN)"]
    I --> J{"Safe Candidates?"}
    J -- No --> E
    J -- Yes --> K["10 · AutoDock Vina (Docking)"]
    K --> L["11 · GNINA (CNN Rescoring)"]
    L --> M["📊 Ranked Results Table"]
    M --> B

    style A fill:#1a1a2e,stroke:#e94560,color:#fff
    style B fill:#16213e,stroke:#0f3460,color:#fff
    style M fill:#1a1a2e,stroke:#e94560,color:#fff

Standard Operating Procedure

Step	Tool	Purpose
1	`fetch_and_prep_pdb.py`	Download PDB, strip waters & ligands → clean receptor
2	`find_binding_site.py`	Locate co-crystallised ligand → extract grid coordinates
3	`mutate_molecule.py`	SELFIES-based random mutations → novel candidate SMILES
4	`filter_molecules.py`	Lipinski, QED, SA-score, optional CNS/BBB rules
5	`adme_profiler.py`	Oral bioavailability: Lipinski Ro5, Veber's rules, ESOL LogS
6	`caco2_predictor.py`	Gut permeability: Egan Egg + passive-diffusion heuristics
7	`metabolic_profiler.py`	CYP450 (3A4/2D6) inhibition and DDI risk assessment
8	`toxicology_scanner.py`	PAINS, BRENK, NIH structural-alert detection
9	`deep_tox_oracle.py`	Multi-endpoint Tox21 neural network predictions
10	`run_autodock_vina.py`	Physics-based 3-D molecular docking
11	`gnina_docking.py`	CNN-scored docking (deep learning re-scoring)

🔧 Tools & Technologies

Category	Technology	Role
LLM Orchestration	OpenAI API	Reasoning engine, tool dispatch, self-correction
Cheminformatics	RDKit & SELFIES	Descriptors, SMILES mutation, PAINS/CYP450 heuristics
Pharmacokinetics	Delaney ESOL, Egan Egg	Solubility and Caco-2 permeability estimation
Deep Learning (Tox)	PyTorch	Multi-endpoint Tox21 toxicity neural network
3-D Docking	AutoDock Vina & GNINA	Physics-based affinity and CNN-based pose rescoring

⚙️ Installation & Setup

1. Create the Conda Environment

conda create -n autorx python=3.10 -y
conda activate autorx

# Install core dependencies
conda install -c conda-forge rdkit autodock-vina openbabel biopython -y
pip install -r requirements.txt

2. Configure API Key

git clone https://github.com/stavrosmetalikis/AutoRx.git
cd AutoRx
export OPENAI_API_KEY="sk-..."

📊 Example: SARS-CoV-2 Helicase Inhibitor Optimisation

python agent_orchestrator.py \
  "Design a selective, orally bioavailable inhibitor for the SARS-CoV-2 helicase (PDB 7NN0). Use this SMILES as the seed: c1ccc(c2nc3ccccc3[nH]2)cc1."

2-vinylbenzimidazole docked in SARS-CoV-2 Helicase

Top candidate (2-vinylbenzimidazole) docked in the active site of the SARS-CoV-2 helicase (PDB: 7NN0).

Agent Output Summary:

The autonomous agent successfully mutated the seed, passed 20 variants through ADME and Caco-2 Gut Permeability models, evaluated CYP3A4 metabolic risks, and docked the survivors.

Rank	SMILES	Vina Affinity	GNINA CNN Score	Risk Profile
1	`C=CC1=NC2=CC=CC=C2[NH1]1`	-5.319 kcal/mol	0.7087	✅ High Permeability, Mod Tox
2	`C1=CC=C(C2=NC3=CC=CC=C3[NH1]2)C1`	-6.306 kcal/mol	0.6253	✅ High Permeability, Mod Tox

Agent Conclusion: The top candidate (C=CC1=NC2=CC=CC=C2[NH1]1) achieves excellent CNN-scores (0.7087) with highly favorable ADME properties (Low MW, High Caco-2 permeability, strong QED). Recommended for further structural optimization to mitigate mild CYP3A4 heme-binding flags caused by the unsubstituted imidazole scaffold.

🔮 Future Scope

Blood-Brain Barrier (BBB) Penetration: Integrate BBB models for neurological targets.
Molecular Dynamics (MD): Post-docking MD simulations with OpenMM to validate stability over time.
Multi-Objective Optimisation: Pareto-front optimisation balancing affinity, selectivity, toxicity, and synthesisability.
Retrosynthetic Planning: Integrate AiZynthFinder to verify proposed molecules are easily synthesisable.

🙏 Acknowledgments

The Tox21 dataset used for training the Deep Tox Oracle is provided by the NIH/EPA Tox21 program.

📜 License

This project is licensed under the GNU General Public License v3.0 (GPLv3) — see the LICENSE file for details. This ensures the project remains open-source and freely available for the scientific community.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 AutoRx

Autonomous LLM Agent for De Novo Drug Design

📖 Overview

🏗️ Architecture — The Agentic Loop

Standard Operating Procedure

🔧 Tools & Technologies

⚙️ Installation & Setup

1. Create the Conda Environment

2. Configure API Key

📊 Example: SARS-CoV-2 Helicase Inhibitor Optimisation

Agent Output Summary:

🔮 Future Scope

🙏 Acknowledgments

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧬 AutoRx

Autonomous LLM Agent for De Novo Drug Design

📖 Overview

🏗️ Architecture — The Agentic Loop

Standard Operating Procedure

🔧 Tools & Technologies

⚙️ Installation & Setup

1. Create the Conda Environment

2. Configure API Key

📊 Example: SARS-CoV-2 Helicase Inhibitor Optimisation

Agent Output Summary:

🔮 Future Scope

🙏 Acknowledgments

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages