Please refer to the code manual for complete information.
Contains experimental data for GCN and DMPNN, organized by category.
-
/data_record/GCN/GCN_no_data_selection
Results of hyperparameter optimization using Optuna without data selection. -
/data_record/GCN/GCN_trying
Results of hyperparameter optimization using different samplers in Optuna. -
/data_record/GCN/Remove_MW_TPSA_outlier
Results after applying data selection based on MW and TPSA to remove outliers.
-
/data_record/DMPNN/No_data_selection
Results of hyperparameter optimization using Optuna without data selection. -
/data_record/DMPNN/N3_N7_ring_substructures
Data selection based on compound substructures
(N3 = three-membered ring, N7 = seven-membered ring).
See the PPT inside the folder for details. -
/data_record/DMPNN/Even class training
Training datasets generated by clustering and oversampling minor classes
to achieve an even class distribution.
See the PPT inside the folder for details. -
/data_record/DMPNN/remove_wwl_outlier
Data selection based on the WWL distance matrix, using the IQR method
to remove outliers. See the PPT inside the folder for details.
Fingerprint CSV files generated with padelpy for the datasets used in this study.
Datasets categorized according to different data selection methods.
These files are directly used when running scripts in /code.
-
eli_duplcate_hf.csv
Main file used for model training (baseline, no data selection). -
regression_equal_no_ion_5_smaller.csv
Original dataset passed down from a senior colleague.
Compound structures visualized using RDKit, ordered according to
the SMILES column in eli_duplcate_hf.csv.
Documentation on how the data in regression_equal_no_ion_5_smaller.csv was collected.