You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran the preprocess.sh and get the following output.
[2020-10-09 10:52:15,369 INFO] Extracting features...
[2020-10-09 10:52:15,371 INFO] * number of source features: 0.
[2020-10-09 10:52:15,371 INFO] * number of target features: 0.
[2020-10-09 10:52:15,371 INFO] Building `Fields` object...
[2020-10-09 10:52:15,371 INFO] Building & saving training data...
[2020-10-09 10:52:15,372 INFO] Reading source and target files: data/ChEMBL/src-train data/ChEMBL/tgt-train.
[2020-10-09 10:52:15,810 INFO] Splitting shard 0.
[2020-10-09 10:52:16,380 INFO] Building shard 0.
[2020-10-09 10:53:27,915 INFO] * saving 0th train data shard to data/ChEMBL/.train.0.pt.
[2020-10-09 10:53:59,229 INFO] Building & saving validation data...
[2020-10-09 10:53:59,231 INFO] Reading source and target files: data/ChEMBL/src-val data/ChEMBL/tgt-val.
[2020-10-09 10:53:59,267 INFO] Splitting shard 0.
[2020-10-09 10:53:59,331 INFO] Building shard 0.
[2020-10-09 10:54:08,047 INFO] * saving 0th valid data shard to data/ChEMBL/.valid.0.pt.
[2020-10-09 10:54:11,926 INFO] Building & saving vocabulary...
[2020-10-09 10:54:15,444 INFO] * reloading data/ChEMBL/.train.0.pt.
[2020-10-09 10:54:20,820 INFO] * tgt vocab size: 34.
[2020-10-09 10:54:20,820 INFO] * src vocab size: 50.
[2020-10-09 10:54:20,820 INFO] * merging src and tgt vocab...
But then the subsequent training.sh failed to run and gave me this
Traceback (most recent call last):
File "train.py", line 118, in <module>
main(opt)
File "train.py", line 51, in main
single_main(opt, 0)
File "/home/UK/ama/Development/SyntaLinker/onmt/train_single.py", line 100, in main
first_dataset = next(lazily_load_dataset("train", opt))
File "/home/UK/ama/Development/SyntaLinker/onmt/inputters/inputter.py", line 551, in lazily_load_dataset
yield _lazy_dataset_loader(pt, corpus_type)
File "/home/UK/ama/Development/SyntaLinker/onmt/inputters/inputter.py", line 538, in _lazy_dataset_loader
dataset = torch.load(pt_file)
File "/home/UK/ama/.conda/envs/SyntaLinker/lib/python3.6/site-packages/torch/serialization.py", line 419, in load
f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'data/ChEMBL/ChEMBL.train.pt'
These are the only files in the data.ChEMBL directory
total 304132
drwxr-xr-x 2 ama domain users 4096 Oct 9 10:54 .
drwxr-xr-x 3 ama domain users 4096 Oct 8 11:01 ..
-rw-r--r-- 1 ama domain users 6511683 Oct 8 11:01 src-test.txt
-rw-r--r-- 1 ama domain users 52028301 Oct 8 11:01 src-train
-rw-r--r-- 1 ama domain users 6502030 Oct 8 11:01 src-val
-rw-r--r-- 1 ama domain users 8071432 Oct 8 11:01 tgt-test.txt
-rw-r--r-- 1 ama domain users 64500626 Oct 8 11:01 tgt-train
-rw-r--r-- 1 ama domain users 8060092 Oct 8 11:01 tgt-val
-rw-r--r-- 1 ama domain users 146295349 Oct 9 10:54 .train.0.pt
-rw-r--r-- 1 ama domain users 18167444 Oct 9 10:54 .valid.0.pt
-rw-r--r-- 1 ama domain users 1355 Oct 9 10:54 .vocab.pt
The text was updated successfully, but these errors were encountered:
I get the same thing. In the train_single_model.pyscript, there is a call to this inputter function that tries to lazy load a pt file that doesn't exist -- never committed to repository and not generated by preprocessing script
I ran the preprocess.sh and get the following output.
But then the subsequent training.sh failed to run and gave me this
These are the only files in the data.ChEMBL directory
The text was updated successfully, but these errors were encountered: