PROTAC-STAN

This is the official codebase of the paper: "Interpretable PROTAC degradation prediction with structure-informed deep ternary attention framework"

Overview

This study introduces PROTAC-STAN, a structure-informed deep ternary attention network (STAN) framework for interpretable PROTAC degradation prediction. It's the first study to coherently model the three-body interactions of PROTAC therapeutics with a tailored deep learning architecture. PROTAC-STAN represents PROTAC molecules across atom, molecule, and property hierarchies and incorporates structure information for POIs and E3 ligases using a protein language model infused with structural data. Furthermore, it simulates interactions among three entities via a novel ternary attention network tailored for the PROTAC system, providing unprecedented insights into the degradation mechanism.

Datasets

PROTAC-DB

The original data can be accessed at PROTAC-DB.

PROTAC-fine

We enrich degradation information to the PROTAC-DB 2.0 and construct a refined PROTAC dataset named PROTAC-fine. The data are stored in data/PROTAC-fine folder.

Directory instructions

Demo

.
├── config_demo.toml
├── data
│   └── demo
└── demo.ipynb

Training and inference

.
├── config.toml
├── data
├── data_loader.py
├── data.py
├── inference.py
├── main.py
├── model.py
├── saved_models
└── tan.py

Custom data preparation

.
├── data
│   ├── custom
├── esm_embed
│   ├── get_embed_s.py
│   ├── model
│   └── README.md
└──  prepare_data.ipynb

Demo

We provide PROTAC-STAN running demo through a Jupyter notebook demo.ipynb. Note it is based on a small demo dataset of PROTAC-fine. This demo only takes about 5 minutes to complete the whole pipeline. For running PROTAC-STAN on the full dataset, we advise GPU ram >= 8GB and CPU ram >= 16GB.

System requirements

PROTAC-STAN has been tested on Linux operating systems (Ubuntu 20.04.1).

Python Dependencies:

Python (version >= 3.11.5)
PyTorch (version >= 2.1.0)
RDKit (version >= 2023.9.2)
pyg (version >= 2.5.1)

Installation guide

It normally takes about 10 minutes to install on a normal desktop computer (based on your network).

Create Conda environment

conda create -n protac-stan python=3.11.5
conda activate protac-stan

Install Pytorch

conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install numpy==1.26.4

Install other essential packages

pip install torch_geometric==2.5.1
pip install rdkit==2023.9.2
pip install pandas==2.1.1
pip install toml==0.10.2
pip install wandb
# [Optional] Install torch-scatter to accelerate pyg computing
wget https://data.pyg.org/whl/torch-2.1.0%2Bcu118/torch_scatter-2.1.2%2Bpt21cu118-cp311-cp311-linux_x86_64.whl
pip install torch_scatter-2.1.2+pt21cu118-cp311-cp311-linux_x86_64.whl

Download repository

git clone https://github.com/PROTACs/PROTAC-STAN.git
cd PROTAC-STAN

Tip

See protac-stan.yml for full requriements.

Training to reproduce results

We have prepared the PROTAC-fine dataset in directory data/PROTAC-fine.

To train the PROTAC-STAN model from scratch, run the following script:

python main.py

Evaluation results of PROTAC-STAN and baselines on test set considering data leakage:

Inference on your data

inference.py leverage PROTAC-STAN as a powerful tool to perform interpretable PROTAC degradation prediction.

Prepare your customed data following prepare_data.ipynb
Predict your data:

# Usage: python inference.py [-h] [--root ROOT] [--name NAME] [--save_att]
python inference.py --root 'data/custom' --name 'custom'

You may use --save_att argument to save attention maps to take further anaysis, here are our examples:

Tip

You may use Python packages like matplotlib, RDKit, Visualization software like Maestro, PyMOL and so on.

3D and 2D attention map visualization

Molecule and complex visualization

Citation

@misc{chen2024Interpretable,
  title = {Interpretable {{PROTAC}} Degradation Prediction with Structure-Informed Deep Ternary Attention Framework},
  author = {Chen, Zhenglu and Gu, Chunbin and Tan, Shuoyan and Wang, Xiaorui and Li, Yuquan and He, Mutian and Lu, Ruiqiang and Sun, Shijia and Hsieh, Chang-Yu and Yao, Xiaojun and Liu, Huanxiang and Heng, Pheng-Ann},
  year = {2024},
  primaryclass = {New Results},
  pages = {2024.11.05.622005},
  doi = {10.1101/2024.11.05.622005},
  urldate = {2024-11-09},
  archiveprefix = {bioRxiv},
  chapter = {New Results}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PROTAC-STAN

Overview

Datasets

PROTAC-DB

PROTAC-fine

Directory instructions

Demo

Training and inference

Custom data preparation

Demo

System requirements

Installation guide

Training to reproduce results

Inference on your data

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
data		data
esm_embed		esm_embed
saved_models		saved_models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.toml		config.toml
config_demo.toml		config_demo.toml
data.py		data.py
data_loader.py		data_loader.py
demo.ipynb		demo.ipynb
inference.py		inference.py
main.py		main.py
model.py		model.py
prepare_data.ipynb		prepare_data.ipynb
protac-stan.yml		protac-stan.yml
tan.py		tan.py

License

PROTACs/PROTAC-STAN

Folders and files

Latest commit

History

Repository files navigation

PROTAC-STAN

Overview

Datasets

PROTAC-DB

PROTAC-fine

Directory instructions

Demo

Training and inference

Custom data preparation

Demo

System requirements

Installation guide

Training to reproduce results

Inference on your data

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages