Skip to content

stavrosmetalikis/AutoRX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬 AutoRx

Autonomous LLM Agent for De Novo Drug Design

Python 3.10+ RDKit AutoDock Vina GNINA License: GPL v3

An AI system that reasons about molecular design — mutating, filtering, and docking drug candidates autonomously.


📖 Overview

AutoRx is a fully autonomous drug-design agent that bridges complex AI orchestration with rigid bioinformatics. Built using LLM-Assisted Engineering, a large-language model (GPT-4o, Qwen, or any OpenAI-compatible endpoint) acts as the reasoning engine to orchestrate a rigorous 11-step pipeline of deterministic cheminformatics tools.

Given a protein target (PDB ID) and a seed molecule (SMILES), AutoRx will autonomously:

  1. Fetch and clean the 3-D protein structure from RCSB PDB.
  2. Identify the active site from a co-crystallised ligand.
  3. Mutate the seed molecule using SELFIES to generate novel chemical space.
  4. Triage candidates through 2-D drug-likeness filters.
  5. Evaluate Oral Bioavailability (Lipinski/Veber/ESOL).
  6. Predict Gut Permeability (Caco-2 / Egan Egg models).
  7. Assess Metabolic Stability (CYP450 inhibition risk).
  8. Screen for toxic structural alerts (PAINS, BRENK).
  9. Run deep-learning toxicity predictions via a Tox21 neural network.
  10. Dock the survivors using AutoDock Vina (physics-based).
  11. Rescore binding affinities using GNINA (CNN-based).

If candidates fail strict pharmacokinetic safety checks, the agent self-corrects, looping back to re-mutate the seed and retry the pipeline.


🏗️ Architecture — The Agentic Loop

flowchart TD
    A["👤 User Prompt"] --> B["🧠 LLM Reasoning Engine"]
    B --> C["1 · Fetch & Prep PDB"]
    C --> D["2 · Find Binding Site"]
    D --> E["3 · Mutate Seed (SELFIES)"]
    E --> F["4 · 2-D Drug-Likeness Filter"]
    F --> G{"All Failed?"}
    G -- Yes --> E
    G -- No --> FA["5 · ADME Profiler (Oral)"]
    FA --> FB["6 · Caco-2 Predictor (Gut)"]
    FB --> FC["7 · Metabolic Profiler (CYP450)"]
    FC --> H["8 · Toxicology Scanner (PAINS)"]
    H --> I["9 · Deep Tox Oracle (Tox21 NN)"]
    I --> J{"Safe Candidates?"}
    J -- No --> E
    J -- Yes --> K["10 · AutoDock Vina (Docking)"]
    K --> L["11 · GNINA (CNN Rescoring)"]
    L --> M["📊 Ranked Results Table"]
    M --> B

    style A fill:#1a1a2e,stroke:#e94560,color:#fff
    style B fill:#16213e,stroke:#0f3460,color:#fff
    style M fill:#1a1a2e,stroke:#e94560,color:#fff
Loading

Standard Operating Procedure

Step Tool Purpose
1 fetch_and_prep_pdb.py Download PDB, strip waters & ligands → clean receptor
2 find_binding_site.py Locate co-crystallised ligand → extract grid coordinates
3 mutate_molecule.py SELFIES-based random mutations → novel candidate SMILES
4 filter_molecules.py Lipinski, QED, SA-score, optional CNS/BBB rules
5 adme_profiler.py Oral bioavailability: Lipinski Ro5, Veber's rules, ESOL LogS
6 caco2_predictor.py Gut permeability: Egan Egg + passive-diffusion heuristics
7 metabolic_profiler.py CYP450 (3A4/2D6) inhibition and DDI risk assessment
8 toxicology_scanner.py PAINS, BRENK, NIH structural-alert detection
9 deep_tox_oracle.py Multi-endpoint Tox21 neural network predictions
10 run_autodock_vina.py Physics-based 3-D molecular docking
11 gnina_docking.py CNN-scored docking (deep learning re-scoring)

🔧 Tools & Technologies

Category Technology Role
LLM Orchestration OpenAI API Reasoning engine, tool dispatch, self-correction
Cheminformatics RDKit & SELFIES Descriptors, SMILES mutation, PAINS/CYP450 heuristics
Pharmacokinetics Delaney ESOL, Egan Egg Solubility and Caco-2 permeability estimation
Deep Learning (Tox) PyTorch Multi-endpoint Tox21 toxicity neural network
3-D Docking AutoDock Vina & GNINA Physics-based affinity and CNN-based pose rescoring

⚙️ Installation & Setup

1. Create the Conda Environment

conda create -n autorx python=3.10 -y
conda activate autorx

# Install core dependencies
conda install -c conda-forge rdkit autodock-vina openbabel biopython -y
pip install -r requirements.txt

2. Configure API Key

git clone https://github.com/stavrosmetalikis/AutoRx.git
cd AutoRx
export OPENAI_API_KEY="sk-..."

📊 Example: SARS-CoV-2 Helicase Inhibitor Optimisation

python agent_orchestrator.py \
  "Design a selective, orally bioavailable inhibitor for the SARS-CoV-2 helicase (PDB 7NN0). Use this SMILES as the seed: c1ccc(c2nc3ccccc3[nH]2)cc1."
2-vinylbenzimidazole docked in SARS-CoV-2 Helicase

Top candidate (2-vinylbenzimidazole) docked in the active site of the SARS-CoV-2 helicase (PDB: 7NN0).

Agent Output Summary:

The autonomous agent successfully mutated the seed, passed 20 variants through ADME and Caco-2 Gut Permeability models, evaluated CYP3A4 metabolic risks, and docked the survivors.

Rank SMILES Vina Affinity GNINA CNN Score Risk Profile
1 C=CC1=NC2=CC=CC=C2[NH1]1 -5.319 kcal/mol 0.7087 ✅ High Permeability, Mod Tox
2 C1=CC=C(C2=NC3=CC=CC=C3[NH1]2)C1 -6.306 kcal/mol 0.6253 ✅ High Permeability, Mod Tox

Agent Conclusion: The top candidate (C=CC1=NC2=CC=CC=C2[NH1]1) achieves excellent CNN-scores (0.7087) with highly favorable ADME properties (Low MW, High Caco-2 permeability, strong QED). Recommended for further structural optimization to mitigate mild CYP3A4 heme-binding flags caused by the unsubstituted imidazole scaffold.


🔮 Future Scope

  • Blood-Brain Barrier (BBB) Penetration: Integrate BBB models for neurological targets.
  • Molecular Dynamics (MD): Post-docking MD simulations with OpenMM to validate stability over time.
  • Multi-Objective Optimisation: Pareto-front optimisation balancing affinity, selectivity, toxicity, and synthesisability.
  • Retrosynthetic Planning: Integrate AiZynthFinder to verify proposed molecules are easily synthesisable.

🙏 Acknowledgments

  • The Tox21 dataset used for training the Deep Tox Oracle is provided by the NIH/EPA Tox21 program.
  • The Synthetic Accessibility (SA) Score module (sascorer.py) is © Novartis Institutes for BioMedical Research.

📜 License

This project is licensed under the GNU General Public License v3.0 (GPLv3) — see the LICENSE file for details. This ensures the project remains open-source and freely available for the scientific community.

Releases

No releases published

Packages

 
 
 

Contributors

Languages