ALDELE:

A deep-learning-based multiple toolkits (DeTool) approach that uses the inputs of enzymes and substrates for biocatalystic tasks.

Introduction

This repository contains the PyTorch implementation of ALDELE framework.
ALDELE is a deep learning framework with two-phase attention and pairwise module to explicitly learn non-covalent local interactions between enzymes and substrates for biocatlytic purpose. It works on two-dimensional (2D) molecular graphs and physicochemical properties of compounds, and target protein sequences with amino acid evolutionary matrix to perform prediction.

Framework

System Requirements

The source code developed in Python 3.10. The required python dependencies are given below.

numpy==1.23.1
packaging==21.3
pickleshare==0.7.5
py3Dmol==2.0.3
python-json-logger==2.0.7
rdkit==2023.3.2
scikit-learn==1.1.1
scipy==1.8.1
torch==1.12.0
torch-geometric==2.3.1
uri-template==1.3.0
urllib3==2.0.3
useful-rdkit-utils==0.2.7

Installation Guide

Clone this Github repo and set up a new conda environment, or create a new project in pycharm with python 3.10.

# create a new conda environment
$ conda create --name ALDELE
$ conda activate ALDELE

# install requried python dependencies
$ pip install pandas
$ pip install numpy==1.23.1 
$ pip install torch
$ pip install scikit-learn

# clone the source code of ALDELE and go into the model folder

Datasets

The datasets folder contains all original experimental data used in ALDELE.

Data construction

the code/Dataset_construction folder contains the guidance of data preproceesing procedure. Please check the Data_construction_protocol.doc for details of each step.

Run ALDELE on Our Data to Reproduce Results

To train ALDELE, we provide the dataset after basic submit bash file with hyperparameters in submit.sh of model/CPI_model. The setting parameters in the submit.sh can be set as 1 to 6 for 6 different combinations of toolkits as we describe in the paper.

component combination:

model1: toolkit2+toolkit3,

model2: toolkit2+toolkit4,

model3: toolkit2+toolkit3+toolkit4,

model4: toolkit1+toolkit2+toolkit3,

model5: toolkit1+toolkit2+toolkit3+toolkit4.

model6: full version, 2 sequence-based features + 2 ligand-based features + structure-based features

The pathdir need to modified based on the path you save the datasets. Simply using the following code to train the model:

$ bash submit.sh

The output results and model can be found in /output_modelx folders.

Acknowledgements

This implementation is inspired and partially based on earlier works [1] and [2].

References

[1] Tsubaki, Masashi, Kentaro Tomii, and Jun Sese. "Compound–protein interaction prediction with end-to-end learning of neural networks for graphs and sequences." Bioinformatics 35.2 (2019): 309-318.
[2] Li, Feiran, et al. "Deep learning-based k cat prediction enables improved enzyme-constrained model reconstruction." Nature Catalysis 5.8 (2022): 662-672.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
code		code
dataset		dataset
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALDELE:

Introduction

Framework

System Requirements

Installation Guide

Datasets

Data construction

Run ALDELE on Our Data to Reproduce Results

component combination:

model1: toolkit2+toolkit3,

model2: toolkit2+toolkit4,

model3: toolkit2+toolkit3+toolkit4,

model4: toolkit1+toolkit2+toolkit3,

model5: toolkit1+toolkit2+toolkit3+toolkit4.

model6: full version, 2 sequence-based features + 2 ligand-based features + structure-based features

Acknowledgements

References

About

Releases

Packages

Languages

Xiangwen-Wang/ALDELE

Folders and files

Latest commit

History

Repository files navigation

ALDELE:

Introduction

Framework

System Requirements

Installation Guide

Datasets

Data construction

Run ALDELE on Our Data to Reproduce Results

component combination:

model1: toolkit2+toolkit3,

model2: toolkit2+toolkit4,

model3: toolkit2+toolkit3+toolkit4,

model4: toolkit1+toolkit2+toolkit3,

model5: toolkit1+toolkit2+toolkit3+toolkit4.

model6: full version, 2 sequence-based features + 2 ligand-based features + structure-based features

Acknowledgements

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages