Hello, I tried to bump the version of RDKit in this project and ran into reproducibility issues for the classification outcomes. This is due to some well-known changes in RDKit's generation of Morgan fingerprints for molecular graphs with added hydrogens (see [rdkit-discuss](https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg11283.html)). Hydrogens are added in NPClassifier's [`calculate_fingerprint` function](https://github.com/mwang87/NP-Classifier/blob/bd7a66608ff61e393a59088aac7be6dbaa9c319f/Classifier/fingerprint_handler.py#L19). The classification models were trained under these circumstances, thus **the last version of RDKit you can use is `rdkit-pypi==2021.9.4`**. Everything beyond that may give irreproducible classifications! My test code: ```Python from rdkit import Chem from rdkit.Chem import rdMolDescriptors smiles = "C[C@]1(C=C[C@H]2C[C@H](CC[C@@H]2[C@H]1C(=O)CCO)CO)O" mol1 = Chem.MolFromSmiles(smiles) mol_fp1 = rdMolDescriptors.GetMorganFingerprintAsBitVect(mol1, radius=1, bitInfo={}, nBits=2048) # always passes assert list(mol_fp1.GetOnBits()) == [29, 80, 142, 222, 473, 494, 622, 650, 787, 807, 848, 926, 1019, 1057, 1060, 1083, 1154, 1274, 1292, 1325, 1516, 1564, 1764, 1873, 1917] mol2 = Chem.MolFromSmiles(smiles) mol2 = Chem.AddHs(mol2) mol_fp2 = rdMolDescriptors.GetMorganFingerprintAsBitVect(mol2, radius=1, bitInfo={}, nBits=2048) # passes with rdkit-pypi==2021.9.4 # fails with rdkit-pypi==2021.9.5.1 and beyond assert list(mol_fp2.GetOnBits()) == [2, 88, 107, 114, 449, 650, 664, 695, 788, 807, 836, 866, 906, 955, 1060, 1233, 1380, 1455, 1477, 1652, 1673, 1804, 1871, 1886, 1917, 2003] ``` NB: I use _pip_-based dependency management, which is why I refer to the RDKit artifacts at PyPI. If you use Conda to install RDKit like in the [Docker image](https://github.com/mwang87/NP-Classifier/blob/bd7a66608ff61e393a59088aac7be6dbaa9c319f/Dockerfile#L14) you'll probably run into incompatibilities with the Boost library at runtime. A working RDKit version from conda-forge with reproducible classifications is `rdkit=2021.09.5`.