Skip to content
cschaerfe edited this page Jul 8, 2015 · 3 revisions

Property distribution analysis

PropertyPlotter can be used to analyze the distribution of molecular properties, e.g. binding free energies or scores assigned by docking or rescoring. These properties should be stored in the input file in property tags.

Something like the following can thus for example be used to generate a plot showing the binding free energy distribution of a given data set

BALL/build/bin/TOOLS/PropertyPlotter -i input.sdf -p1 binding_free_energy -o distribution.png

Similarity analysis

SimilarityAnalyzer is useful to compare two given molecule files with each other. It therefore creates binary, pathway-based fingerprints for each compound in these files and computes the Tanimoto coefficient for each pair of compounds. The distribution of those Tanimoto coefficients is then plotted, which allows to assess the chemical similarity of the two specified files.

Especially for QSAR analyses or training-based rescoring approaches, it is advised to investigate the similarity between training and prediction data sets in this way before attempting any predictions.

BALL/build/bin/TOOLS/SimilarityAnalyzer -i1 input1.sdf -i2 input2.sdf -o similarity.png

(or something similar) can be used to create these plots on the command line.

While the first example of a generated plot shown below indicates moderately high and (nearly) normally distributed similarity between two files, the average similarity between molecules in the second example is significantly lower. Furthermore, the second plot displays two distinct clusters, which might be explained with one of the files containing a molecular scaffold that does not appear in the other file (along with a second scaffold that does appear in the other file). Hence, these plots also allow the get a quick impression of the homogeneity of chemical files.

Analysis of docking or rescoring results

In order to assess the discretization power of the chosen docking approach between binders and non-binders, ScoreAnalyzer can be used. As input it needs a molecule file containing compounds docked to the molecular target of interest. The scores assigned by the docking approach to each molecule should be available in property tags within this file. Furthermore, information about whether each compound in reality is a binder or non-binder (as determined by experimental procedures) also has to be stored in a property tag.

ScoreAnalyzer then allows to compute receiver operating characteristic (ROC) curves or enrichments curves. If information about the experimentally determined binding free energy is available (instead of just binder/non-binder information), a scatter plot between the score assigned by docking and the actual binding free energy can also be created.

Receiver Operating Characteristic (ROC) curves

The following example of a ROC plot could be generated for the docking results of HSP90 by

BALL/build/bin/TOOLS/ScoreAnalyzer -i dock_output.sdf -b -s score -mode roc -e Class -o roc.png
In the input file, a binary property tag 'Class' with a value of '0' is thus assumed to indicate that the respective molecule is a non-binder, and '1' is assumed to indicate a binder.

Enrichment plot

An enrichment plot for the same data set can then be easily generated by use of

BALL/build/bin/TOOLS/ScoreAnalyzer -i dock_output.sdf -b -s score -mode enrichment -e Class -o enrichment.png

Clone this wiki locally