-
Notifications
You must be signed in to change notification settings - Fork 32
CADDSuiteAnalysis
PropertyPlotter can be used to analyze the distribution of molecular properties, e.g. binding free energies or scores assigned by docking or rescoring. These properties should be stored in the input file in property tags.
Something like the following can thus for example be used to generate a plot showing the binding free energy distribution of a given data set
BALL/build/bin/TOOLS/PropertyPlotter -i input.sdf -p1 binding_free_energy -o distribution.png
SimilarityAnalyzer is useful to compare two given molecule files with each other. It therefore creates binary, pathway-based fingerprints for each compound in these files and computes the Tanimoto coefficient for each pair of compounds. The distribution of those Tanimoto coefficients is then plotted, which allows to assess the chemical similarity of the two specified files.
Especially for QSAR analyses or training-based rescoring approaches, it is advised to investigate the similarity between training and prediction data sets in this way before attempting any predictions.
BALL/build/bin/TOOLS/SimilarityAnalyzer -i1 input1.sdf -i2 input2.sdf -o similarity.png
(or something similar) can be used to create these plots on the command line.
While the first example of a generated plot shown below indicates moderately high and (nearly) normally distributed similarity between two files, the average similarity between molecules in the second example is significantly lower. Furthermore, the second plot displays two distinct clusters, which might be explained with one of the files containing a molecular scaffold that does not appear in the other file (along with a second scaffold that does appear in the other file). Hence, these plots also allow the get a quick impression of the homogeneity of chemical files.
In order to assess the discretization power of the chosen docking approach between binders and non-binders, ScoreAnalyzer can be used. As input it needs a molecule file containing compounds docked to the molecular target of interest. The scores assigned by the docking approach to each molecule should be available in property tags within this file. Furthermore, information about whether each compound in reality is a binder or non-binder (as determined by experimental procedures) also has to be stored in a property tag.
ScoreAnalyzer then allows to compute receiver operating characteristic (ROC) curves or enrichments curves. If information about the experimentally determined binding free energy is available (instead of just binder/non-binder information), a scatter plot between the score assigned by docking and the actual binding free energy can also be created.
The following example of a ROC plot could be generated for the docking results of HSP90 by
BALL/build/bin/TOOLS/ScoreAnalyzer -i dock_output.sdf -b -s score -mode roc -e Class -o roc.png
In the input file, a binary property tag 'Class' with a value of '0' is thus assumed to indicate that the respective molecule is a non-binder, and '1' is assumed to indicate a binder.
An enrichment plot for the same data set can then be easily generated by use of
BALL/build/bin/TOOLS/ScoreAnalyzer -i dock_output.sdf -b -s score -mode enrichment -e Class -o enrichment.png