Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for molecular Dative Bonds #54

Open
Daniel-ChenJH opened this issue Dec 23, 2024 · 0 comments
Open

Add Support for molecular Dative Bonds #54

Daniel-ChenJH opened this issue Dec 23, 2024 · 0 comments

Comments

@Daniel-ChenJH
Copy link

Hi, I know this repo is under few maintenance, but I'm still wondering how to support dative bonds in it.

The version of RDKIT used by the author, namely 2019.03.4, does not support dative bonds when it comes to Chem.MolFromSmiles(smiles). Therefore, I edit the codes and have successfully made this repo available to much newer versions of RDKIT such as version 2024.3.6, and the total process works quite fine under basic settings.

aimsim_core==2.2.2
aiohappyeyeballs==2.4.4
aiohttp==3.11.10
aiosignal==1.3.2
astartes==1.3.0
asttokens==3.0.0
attrs==24.3.0
chemprop==2.1.0
comm==0.2.2
ConfigArgParse==1.7
contourpy==1.3.1
cycler==0.12.1
debugpy==1.8.11
decorator==5.1.1
descriptastorus==2.8.0
dill==0.3.9
exceptiongroup==1.2.2
executing==2.1.0
filelock==3.16.1
fonttools==4.55.3
frozenlist==1.5.0
fsspec==2024.10.0
idna==3.10
importlib_metadata==8.5.0
ipykernel==6.29.5
ipython==8.30.0
jedi==0.19.2
Jinja2==3.1.4
joblib==1.4.2
jupyter_client==8.6.3
jupyter_core==5.7.2
kiwisolver==1.4.7
lightning==2.4.0
lightning-utilities==0.11.9
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib==3.10.0
matplotlib-inline==0.1.7
mdurl==0.1.2
mhfp==1.9.6
mordredcommunity==2.0.6
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.17
nest_asyncio==1.6.0
networkx==3.4.2
numpy==1.26.4
nvidia-cublas-cu11==11.10.3.66
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu11==8.5.0.96
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
packaging==24.2
padelpy==0.1.16
pandas==2.2.3
pandas-flavor==0.6.0
parso==0.8.4
pexpect==4.9.0
pickleshare==0.7.5
pillow==11.0.0
pip==24.2
platformdirs==4.3.6
prompt_toolkit==3.0.48
propcache==0.2.1
psutil==6.1.0
ptyprocess==0.7.0
pure_eval==0.2.3
Pygments==2.18.0
pyparsing==3.2.0
python-dateutil==2.9.0.post0
pytorch-lightning==2.4.0
pytz==2024.2
PyYAML==6.0.2
pyzmq==26.2.0
rdkit==2024.3.6
rich==13.9.4
scikit-learn==1.6.0
scipy==1.14.1
setuptools==75.1.0
six==1.17.0
stack_data==0.6.3
sympy==1.13.1
tabulate==0.9.0
threadpoolctl==3.5.0
torch==2.5.1
torchmetrics==1.6.0
tornado==6.4.2
tqdm==4.67.1
traitlets==5.14.3
triton==3.1.0
typing_extensions==4.12.2
tzdata==2024.2
wcwidth==0.2.13
wheel==0.44.0
xarray==2024.11.0
yarl==1.18.3
zipp==3.21.0

That's quite easy. All you need is python 3.11.9 and install these versions of python libraries above. You may also need to change line 17 of hgraph/chemutils.py into the following codes.

    if mol is not None: Chem.Kekulize(mol, clearAromaticFlags=True)

When I try to deal with molecular with dative bonds, I add 'Chem.rdchem.BondType.DATIVE' to BOND_LIST in file hgraph/mol_graph. The get_vocab and preprocessing process works fine, however in train_generator, the training process raise error in line 284 of hgraph/decoder.py, saying that dimension does not match between cand_vecs and icls_vecs. A simple python slice kindly fixes the problem, and the following training process works well. (I have no answer on how to fix it, so I just use a python slice to skip it).

    BOND_LIST = [Chem.rdchem.BondType.SINGLE, Chem.rdchem.BondType.DOUBLE, Chem.rdchem.BondType.TRIPLE, Chem.rdchem.BondType.AROMATIC, Chem.rdchem.BondType.DATIVE] 

Here is a part of my vocab, you can see there's DATIVE BONDS in it.


C1=CN[Pt]NC1 C1=CN[Pt:1]NC1
C1=CN[Pt]NC1 C1=C[NH:1][Pt]NC1
C1=COC=N1 C1=CO[CH:1]=N1
C1=CO[Pt]23<-N(=C1)C=CN->2=CCCO3 C1=CO[Pt]23<-N(=C1)C=[CH:1]N->2=CCCO3
C1=CO[Pt]23<-N(=C1)CCN->2=CCCO3 C1=CO[Pt]23<-N(=C1)CCN->2=[CH:1]CCO3
C1=CO[Pt]23<-N(=C1)CCN->2=CCCO3 C1=CO[Pt]23<-N(=C1)CC[N:1]->2=CCCO3
C1=CO[Pt]23<-N(=C1)CCN->2=CCCO3 C1=CO[Pt]23<-N(=C1)C[CH2:1]N->2=CCCO3
C1=CO[Pt]23<-N(=C1)CCN->2=CCCO3 C1=CO[Pt]23<-N(=C1)C[CH2:1][N:1]->2=[CH:1]CCO3
C1=CO[Pt]2<-N(=C1)CC=C2 C1=CO[Pt:1]2<-N(=C1)CC=C2
C1=CO[Pt]2<-N(=C1)CCC2 C1=CO[Pt]2<-N(=C1)CC[CH2:1]2
C1=CO[Pt]2<-N(=C1)CCC2 C1=CO[Pt]2<-N(=C1)C[CH2:1]C2
C1=CO[Pt]2<-N(=C1)CCC2 C1=CO[Pt]2<-N(=C1)C[CH2:1][CH2:1]2
C1=CO[Pt]CN1 C1=CO[Pt:1]CN1
C1=CSC=C1 C1=CS[CH:1]=C1
C1=CSC=C1 C1=C[CH:1]=CS1
C1=CSC=N1 C1=CS[CH:1]=N1
C1=C[Pt]23<-N(=C1)CC=N->2N[Pt]12<-N(=CC=C1)CC=N->2N3 C1=C[Pt]23<-N(=C1)CC=N->2N[Pt]12<-N(=CC=[CH:1]1)CC=N->2N3
C1=C[Pt]23<-N(=C1)CC=N->2N[Pt]12<-N(=CC=C1)CC=N->2N3 C1=C[Pt]23<-N(=C1)CC=N->2N[Pt]12<-N(=C[CH:1]=C1)CC=N->2N3
C1=C[Pt]23<-N(=C1)CC=N->2N[Pt]12<-N(=CC=C1)CC=N->2N3 C1=C[Pt]23<-N(=C1)CC=N->2N[Pt]12<-N(=C[CH:1]=[CH:1]1)CC=N->2N3
        cand_vecs = torch.cat( [cand_vecs, icls_vecs[:cand_vecs.shape[0]], order_vecs], dim=-1 )

However, when I try to use the models trained from the process above to generate new ones, the code cannot work anymore.

(chem) root@autodl-container-06a3468bc2-0b38bd02:~/autodl-tmp/chem/hgraph2graph# python generate.py --vocab pt_vocab.txt --model ckpt/tmp/model.ckpt.12 --nsample 100 > output.txt
/root/miniconda3/envs/chem/lib/python3.11/site-packages/torch/nn/_reduction.py:51: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
/root/autodl-tmp/chem/hgraph2graph/generate.py:44: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  model.load_state_dict(torch.load(args.model)[0])
  0%|                                                                                                                                                                                        | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/root/autodl-tmp/chem/hgraph2graph/generate.py", line 52, in <module>
    smiles_list = model.sample(args.batch_size, greedy=True)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/autodl-tmp/chem/hgraph2graph/hgraph/hgnn.py", line 42, in sample
    return self.decoder.decode((root_vecs, root_vecs, root_vecs), greedy=greedy, max_decode_step=150)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/autodl-tmp/chem/hgraph2graph/hgraph/decoder.py", line 410, in decode
    new_atoms, new_bonds, attached = graph_batch.add_mol(bid, ismiles, inter_label, nth_child)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/autodl-tmp/chem/hgraph2graph/hgraph/inc_graph.py", line 144, in add_mol
    self.add_edge(a1, a2, self.get_mess_feature(bond.GetBeginAtom(), bond_type, nth_child if a2 in attached else 0) ) #only child to father node (in intersection) have non-zero nth_child
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/autodl-tmp/chem/hgraph2graph/hgraph/inc_graph.py", line 41, in add_edge
    self.agraph[j, self.graph.in_degree(j) - 1] = idx
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index 10 is out of bounds for dimension 1 with size 10


I cannot figure out where this IndexError comes from and how to fix it, but I know datasets without dative bonds will never encounter this error.

Could anyone help me out? Any suggestions would be really appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant