Overview

Array Type Meta-Predictor Based Classification

The Array type meta-predictor based classier was develped for prediction of Harmful agents.

Figure 1: Design of meta-predictor for multi-tasking and data workflow.

Requirements:

The following necessary packages should be installed in to process, generate fingerprint, train and test your model.

KNIME analytics platform (> 3.5)
python >3.0
Keras
Tensorflow
Scikit Learn
Jupyter Notebook
R / R-Studio

Procedures:

1. Data Structure:

Collection of chemical data from ChEMBL (Version 24) data for five different targets (i.e. nicotinic acetylcholine receptors (nAChR), muscarinic acetylcholine receptors (mAChR), vesicular acetylcholine receptors (VAChT), acetylcholinesterase (AChE) and butyrylcholinesterase (BUChE) using ChEMBL provided SQL file.
Negative data: We used the Decoy Finder to collect the negative data for each targets, in order to keep the balance between the active and inactive classes.

External data: In order to evaluate the performance of classifer, we used the external dataset from Chemical Warfare Agents (CWA) and New Psychoactive Substance (NPS).

Table 1: The Dataset Composition

Targets	Collected Data	Train (Active)	Train (Inactive)	Test (Active)	Test (Inactive)	Total
nAChR	3175	637	637	272	272	1818
mAChR	24223	2431	2431	1041	1041	6944
BuChE	2414	484	484	207	207	1382
AChE	8212	1086	1086	464	464	3098
VAChT	231	106	106	45	45	302
CWA	95					95
NPS	3126					3126

2. Feature Generation:

We used the CDK Descriptor Kit for calculation of different Fingerprints (i.e. ECFP0, ECFP2, ECFP4, ECFP6, FCFP0, FCFP2, FCFP4, FCFP6) with fixed bitvector length (length:1024).

All the input data along with calculated features are available under the data folder.

3. Model training:

To build classification models, we used the four machine learning methods (i.e. Random Forest, Decision Tree, Support Vector Machine, k-Nearest Neighbor)
- We have used R for machine learning classification and a R code for one target (i.e. nAChR) has been provided under the script folder. The same code was used for learning for other targets. We build 10 models (M1-M10) for each ML classifier and for each targets. This result in 4 ML method * 5 targets * 10 models (M1-M10) = 200 models.

Figure 2: The statistical performance based on AUC in external and internal validation for respective targets.

4. CNN training:

To build the array-type meta-predictor based classifier, we used the CNN architecture with multiple layer of network for learning, validation and testing purposes.
- After training, testing and validating the ML classifer for different targets, the external set as third set was used to evaluate the ML performance. The predicted values for each targets, for each ML method and for different models (M1-M10) on external sets were used as input feature matrix with different shapes (i.e. CNN, CNN-3D, CNN-3D Reshaped) for array-type meta-predictor based CNN classification.
- The array-type meta-predictor CNN code is available under script folder.

Figure 3: The prediction performace of array-type meta predictor for three different shaped array classifiers according to data sampling (Model 01 to 04).

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
data		data
script		script
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Array Type Meta-Predictor Based Classification

Figure 1: Design of meta-predictor for multi-tasking and data workflow.

Requirements:

Procedures:

1. Data Structure:

Table 1: The Dataset Composition

2. Feature Generation:

3. Model training:

Figure 2: The statistical performance based on AUC in external and internal validation for respective targets.

4. CNN training:

Figure 3: The prediction performace of array-type meta predictor for three different shaped array classifiers according to data sampling (Model 01 to 04).

For any queries regarding the above work, kindly contact Prof. Mi-hyun Kim ([email protected]).

About

Releases

Packages

Contributors 2

Languages

college-of-pharmacy-gachon-university/Array_Classifier

Folders and files

Latest commit

History

Repository files navigation

Overview

Array Type Meta-Predictor Based Classification

Figure 1: Design of meta-predictor for multi-tasking and data workflow.

Requirements:

Procedures:

1. Data Structure:

Table 1: The Dataset Composition

2. Feature Generation:

3. Model training:

Figure 2: The statistical performance based on AUC in external and internal validation for respective targets.

4. CNN training:

Figure 3: The prediction performace of array-type meta predictor for three different shaped array classifiers according to data sampling (Model 01 to 04).

For any queries regarding the above work, kindly contact Prof. Mi-hyun Kim ([email protected]).

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages