This is the official codebase of the paper: "Interpretable PROTAC degradation prediction with structure-informed deep ternary attention framework"
This study introduces PROTAC-STAN, a structure-informed deep ternary attention network (STAN) framework for interpretable PROTAC degradation prediction. It's the first study to coherently model the three-body interactions of PROTAC therapeutics with a tailored deep learning architecture. PROTAC-STAN represents PROTAC molecules across atom, molecule, and property hierarchies and incorporates structure information for POIs and E3 ligases using a protein language model infused with structural data. Furthermore, it simulates interactions among three entities via a novel ternary attention network tailored for the PROTAC system, providing unprecedented insights into the degradation mechanism.
The original data can be accessed at PROTAC-DB.
We enrich degradation information to the PROTAC-DB 2.0 and construct a refined PROTAC dataset named PROTAC-fine. The data are stored in data/PROTAC-fine
folder.
.
├── config_demo.toml
├── data
│ └── demo
└── demo.ipynb
.
├── config.toml
├── data
├── data_loader.py
├── data.py
├── inference.py
├── main.py
├── model.py
├── saved_models
└── tan.py
.
├── data
│ ├── custom
├── esm_embed
│ ├── get_embed_s.py
│ ├── model
│ └── README.md
└── prepare_data.ipynb
We provide PROTAC-STAN running demo through a Jupyter notebook demo.ipynb
. Note it is based on a small demo dataset of PROTAC-fine. This demo only takes about 5 minutes to complete the whole pipeline. For running PROTAC-STAN on the full dataset, we advise GPU ram >= 8GB and CPU ram >= 16GB.
PROTAC-STAN has been tested on Linux operating systems (Ubuntu 20.04.1).
Python Dependencies:
- Python (version >= 3.11.5)
- PyTorch (version >= 2.1.0)
- RDKit (version >= 2023.9.2)
- pyg (version >= 2.5.1)
It normally takes about 10 minutes to install on a normal desktop computer (based on your network).
- Create Conda environment
conda create -n protac-stan python=3.11.5
conda activate protac-stan
- Install Pytorch
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install numpy==1.26.4
- Install other essential packages
pip install torch_geometric==2.5.1
pip install rdkit==2023.9.2
pip install pandas==2.1.1
pip install toml==0.10.2
pip install wandb
# [Optional] Install torch-scatter to accelerate pyg computing
wget https://data.pyg.org/whl/torch-2.1.0%2Bcu118/torch_scatter-2.1.2%2Bpt21cu118-cp311-cp311-linux_x86_64.whl
pip install torch_scatter-2.1.2+pt21cu118-cp311-cp311-linux_x86_64.whl
- Download repository
git clone https://github.com/PROTACs/PROTAC-STAN.git
cd PROTAC-STAN
Tip
See protac-stan.yml
for full requriements.
We have prepared the PROTAC-fine dataset in directory data/PROTAC-fine
.
To train the PROTAC-STAN model from scratch, run the following script:
python main.py
Evaluation results of PROTAC-STAN and baselines on test set considering data leakage:
inference.py
leverage PROTAC-STAN as a powerful tool to perform interpretable PROTAC degradation prediction.
- Prepare your customed data following
prepare_data.ipynb
- Predict your data:
# Usage: python inference.py [-h] [--root ROOT] [--name NAME] [--save_att]
python inference.py --root 'data/custom' --name 'custom'
You may use --save_att
argument to save attention maps to take further anaysis, here are our examples:
Tip
You may use Python packages like matplotlib, RDKit, Visualization software like Maestro, PyMOL and so on.
@misc{chen2024Interpretable,
title = {Interpretable {{PROTAC}} Degradation Prediction with Structure-Informed Deep Ternary Attention Framework},
author = {Chen, Zhenglu and Gu, Chunbin and Tan, Shuoyan and Wang, Xiaorui and Li, Yuquan and He, Mutian and Lu, Ruiqiang and Sun, Shijia and Hsieh, Chang-Yu and Yao, Xiaojun and Liu, Huanxiang and Heng, Pheng-Ann},
year = {2024},
primaryclass = {New Results},
pages = {2024.11.05.622005},
doi = {10.1101/2024.11.05.622005},
urldate = {2024-11-09},
archiveprefix = {bioRxiv},
chapter = {New Results}
}