Removed global config and replaced with argparse + Other fixes...

TimDettmers · Oct 1, 2019 · 5feb358 · 5feb358
1 parent c8aa06c
commit 5feb358
Show file tree

Hide file tree

Showing 5 changed files with 177 additions and 114 deletions.
diff --git a/README.md b/README.md
@@ -43,7 +43,7 @@ ConvE with 8 times less parameters is still more powerful than DistMult. Relatio
 
 This repo supports Linux and Python installation via Anaconda. 
 
-1. Install [PyTorch](https://github.com/pytorch/pytorch) using [Anaconda](https://www.continuum.io/downloads). If you compiled PyTorch from source, please checkout the [v0.5 branch](https://github.com/TimDettmers/ConvE/tree/pytorch_v0.5): `git checkout pytorch_0.5`
+1. Install [PyTorch](https://github.com/pytorch/pytorch) using [Anaconda](https://www.continuum.io/downloads).
 2. Install the requirements `pip install -r requirements.txt`
 3. Download the default English model used by [spaCy](https://github.com/explosion/spaCy), which is installed in the previous step `python -m spacy download en`
 4. Run the preprocessing script for WN18RR, FB15k-237, YAGO3-10, UMLS, Kinship, and Nations: `sh preprocess.sh`
@@ -53,22 +53,28 @@ This repo supports Linux and Python installation via Anaconda.
 
 Parameters need to be specified by white-space tuples for example:
 ```
-CUDA_VISIBLE_DEVICES=0 python main.py model ConvE dataset FB15k-237 \
-                                      input_drop 0.2 hidden_drop 0.3 feat_drop 0.2 \
-                                      lr 0.003 process True
+CUDA_VISIBLE_DEVICES=0 python main.py --model conve --data FB15k-237 \
+                                      --input-drop 0.2 --hidden-drop 0.3 --feat-drop 0.2 \
+                                      --lr 0.003 --preprocess
 ```
 will run a ConvE model on FB15k-237.
 
-To run a model, you first need to preprocess the data. This can be done by specifying the `process` parameter:
+To run a model, you first need to preprocess the data once. This can be done by specifying the `--preprocess` parameter:
 ```
-CUDA_VISIBLE_DEVICES=0 python main.py model ConvE dataset FB15k-237 process True
+CUDA_VISIBLE_DEVICES=0 python main.py --data DATASET_NAME --preprocess
 ```
 After the dataset is preprocessed it will be saved to disk and this parameter can be omitted.
 ```
-CUDA_VISIBLE_DEVICES=0 python main.py model ConvE dataset FB15k-237
+CUDA_VISIBLE_DEVICES=0 python main.py --data DATASET_NAME
+```
+The following parameters can be used for the `--model` parameter:
+```
+conve
+distmult
+complex
 ```
 
-Here a list of parameters for the available datasets:
+The following datasets can be used for the `--data` parameter:
 ```
 FB15k-237
 WN18RR
@@ -78,40 +84,75 @@ kinship
 nations
 ```
 
-The following models are available:
-```
-ConvE
-DistMult
-ComplEx
+And here a complete list of parameters.
+```
+Link prediction for knowledge graphs
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --batch-size BATCH_SIZE
+                        input batch size for training (default: 128)
+  --test-batch-size TEST_BATCH_SIZE
+                        input batch size for testing/validation (default: 128)
+  --epochs EPOCHS       number of epochs to train (default: 1000)
+  --lr LR               learning rate (default: 0.003)
+  --seed S              random seed (default: 17)
+  --log-interval LOG_INTERVAL
+                        how many batches to wait before logging training
+                        status
+  --data DATA           Dataset to use: {FB15k-237, YAGO3-10, WN18RR, umls,
+                        nations, kinship}, default: FB15k-237
+  --l2 L2               Weight decay value to use in the optimizer. Default:
+                        0.0
+  --model MODEL         Choose from: {conve, distmult, complex}
+  --embedding-dim EMBEDDING_DIM
+                        The embedding dimension (1D). Default: 200
+  --embedding-shape1 EMBEDDING_SHAPE1
+                        The first dimension of the reshaped 2D embedding. The
+                        second dimension is infered. Default: 20
+  --hidden-drop HIDDEN_DROP
+                        Dropout for the hidden layer. Default: 0.3.
+  --input-drop INPUT_DROP
+                        Dropout for the input embeddings. Default: 0.2.
+  --feat-drop FEAT_DROP
+                        Dropout for the convolutional features. Default: 0.2.
+  --lr-decay LR_DECAY   Decay the learning rate by this factor every epoch.
+                        Default: 0.995
+  --loader-threads LOADER_THREADS
+                        How many loader threads to use for the batch loaders.
+                        Default: 4
+  --preprocess          Preprocess the dataset. Needs to be executed only
+                        once. Default: 4
+  --resume              Resume a model.
+  --use-bias            Use a bias in the convolutional layer. Default: True
+  --label-smoothing LABEL_SMOOTHING
+                        Label smoothing value to use. Default: 0.1
+  --hidden-size HIDDEN_SIZE
+                        The side of the hidden layer. The required size
+                        changes with the size of the embeddings. Default: 9728
+                        (embedding size 200).
+```
+To reproduce most of the results in the ConvE paper, you can use the default parameters and execute the command below:
+```
+CUDA_VISIBLE_DEVICES=0 python main.py --data DATASET_NAME
 ```
+For the reverse model, you can run the provided file with the name of the dataset name and a threshold probability:
 
-The following parameters can be used for the models:
 ```
-batch_size
-input_drop = input_dropout
-feat_drop = feature_map_dropout
-hidden_drop = hidden_dropout
-embedding_dim
-L2
-epochs
-lr_decay = learning_rate_decay
-lr = learning_rate
-label_smoothing = label_smoothing_epsilon 
+python inverse_model.py WN18RR 0.9
 ```
-The parameters with the equal sign are equivalent and short-forms of each other. 
 
-To reproduce most of the results in the ConvE paper, you can use command below:
+### Changing the embedding size for ConvE
 
-```
-CUDA_VISIBLE_DEVICES=0 python main.py model ConvE input_drop 0.2 hidden_drop 0.3 \
-                                      feat_drop 0.2 lr 0.003 lr_decay 0.995 \
-                                      dataset DATASET_NAME
-```
-For the reverse model, you can run the provided file with the name of the dataset name and a threshold probability:
+If you want to change the embedding size you can do that via the ``--embedding-dim` parameter. However, for ConvE, since the embedding is reshaped as a 2D embedding one also needs to pass the first dimension of the reshaped embedding (`--embedding-shape1`) while the second dimension is infered. When once changes the embedding size, the hidden layer size `--hidden-size` also needs to be different but it is difficult to determine before run time. The easiest way to determine the hidden size is to run the model, let it run on an error due to wrong shape, and then reshape according to the dimension in the error message.
 
+Example: Change embedding size to be 100. We want 10x10 2D embeddings. We run `python main.py --embedding-dim 100 --embedding-shape1 10` and we run on an error due to wrong hidden dimension:
+```python
+   ret = torch.addmm(bias, input, weight.t())
+RuntimeError: size mismatch, m1: [128 x 4608], m2: [9728 x 100] at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/generic/THCTensorMathBlas.cu:273
 ```
-python inverse_model.py WN18RR 0.9
-```
+
+Now we change the hidden dimension to 4608 accordingly: `python main.py --embedding-dim 100 --embedding-shape1 10 --hidden-size 4608`. Now the model runs with an embedding size of 100 and 10x10 2D embeddings.
 
 ### Adding new datasets
 
@@ -124,8 +165,7 @@ You can easily write your own knowledge graph model by extending the barebone mo
 ### Quirks
 
 There are some quirks of this framework.
-1. If you use a different embedding size, the ConvE concatenation size cannot be determined automatically and you have to set it yourself in line [103/104](https://github.com/TimDettmers/ConvE/blob/master/model.py#L103). Also the first dimension of the projection layer will change. You will need to comment out the print function ([line 115](https://github.com/TimDettmers/ConvE/blob/master/model.py#L115)) to get the needed dimension, and adjust the size of the fully connected layer in [line 95](https://github.com/TimDettmers/ConvE/blob/master/model.py#L95).
-2. The model currently ignores data that does not fit into the specified batch size, for example if your batch size is 100 and your test data is 220, then 20 samples will be ignored. This is designed in that way to improve performance on small datasets. To test on the full test-data you can save the model checkpoint, load the model (with the `load=True` variable) and then evaluate with a batch size that fits the test data (for 220 you could use a batch size of 110). Another solution is to just use a fitting batch size from the start, that is, you could train with a batch size of 110.
+1. The model currently ignores data that does not fit into the specified batch size, for example if your batch size is 100 and your test data is 220, then 20 samples will be ignored. This is designed in that way to improve performance on small datasets. To test on the full test-data you can save the model checkpoint, load the model (with the `--resume True` variable) and then evaluate with a batch size that fits the test data (for 220 you could use a batch size of 110). Another solution is to just use a fitting batch size from the start, that is, you could train with a batch size of 110.
 
 ### Issues
 

diff --git a/evaluation.py b/evaluation.py
@@ -2,7 +2,6 @@
 import numpy as np
 import datetime
 
-from spodernet.utils.global_config import Config
 from spodernet.utils.logger import Logger
 from torch.autograd import Variable
 from sklearn import metrics
@@ -38,7 +37,7 @@ def ranking_and_hits(model, dev_rank_batcher, vocab, name):
         pred1, pred2 = pred1.data, pred2.data
         e1, e2 = e1.data, e2.data
         e2_multi1, e2_multi2 = e2_multi1.data, e2_multi2.data
-        for i in range(Config.batch_size):
+        for i in range(e1.shape[0]):
             # these filters contain ALL labels
             filter1 = e2_multi1[i].long()
             filter2 = e2_multi2[i].long()
@@ -62,7 +61,7 @@ def ranking_and_hits(model, dev_rank_batcher, vocab, name):
 
         argsort1 = argsort1.cpu().numpy()
         argsort2 = argsort2.cpu().numpy()
-        for i in range(Config.batch_size):
+        for i in range(e1.shape[0]):
             # find the rank of the target entities
             rank1 = np.where(argsort1[i]==e2[i, 0].item())[0][0]
             rank2 = np.where(argsort2[i]==e1[i, 0].item())[0][0]

diff --git a/logs.tar.gz b/logs.tar.gz
diff --git a/main.py b/main.py
@@ -24,26 +24,12 @@
 from spodernet.hooks import LossHook, ETAHook
 from spodernet.utils.util import Timer
 from spodernet.preprocessing.processors import TargetIdx2MultiTarget
-np.set_printoptions(precision=3)
-
-cudnn.benchmark = True
-
-# parse console parameters and set global variables
-Config.backend = Backends.TORCH
-Config.parse_argv(sys.argv)
+import argparse
 
-Config.cuda = True
-Config.embedding_dim = 200
-#Logger.GLOBAL_LOG_LEVEL = LogLevel.DEBUG
 
+np.set_printoptions(precision=3)
 
-#model_name = 'DistMult_{0}_{1}'.format(Config.input_dropout, Config.dropout)
-model_name = '{2}_{0}_{1}'.format(Config.input_dropout, Config.dropout, Config.model_name)
-epochs = 1000
-load = False
-if Config.dataset is None:
-    Config.dataset = 'FB15k-237'
-model_path = 'saved_models/{0}_{1}.model'.format(Config.dataset, model_name)
+cudnn.benchmark = True
 
 
 ''' Preprocess knowledge graph using spodernet. '''
@@ -67,7 +53,7 @@ def preprocess(dataset_name, delete_data=False):
 
     # process full vocabulary and save it to disk
     d.set_path(full_path)
-    p = Pipeline(Config.dataset, delete_data, keys=input_keys, skip_transformation=True)
+    p = Pipeline(args.data, delete_data, keys=input_keys, skip_transformation=True)
     p.add_sent_processor(ToLower())
     p.add_sent_processor(CustomTokenizer(lambda x: x.split(' ')),keys=['e2_multi1', 'e2_multi2'])
     p.add_token_processor(AddToVocab())
@@ -87,43 +73,42 @@ def preprocess(dataset_name, delete_data=False):
         p.execute(d)
 
 
-def main():
-    if Config.process: preprocess(Config.dataset, delete_data=True)
+def main(args, model_path):
+    if args.preprocess: preprocess(args.data, delete_data=True)
     input_keys = ['e1', 'rel', 'rel_eval', 'e2', 'e2_multi1', 'e2_multi2']
-    p = Pipeline(Config.dataset, keys=input_keys)
+    p = Pipeline(args.data, keys=input_keys)
     p.load_vocabs()
     vocab = p.state['vocab']
 
     num_entities = vocab['e1'].num_token
 
-    train_batcher = StreamBatcher(Config.dataset, 'train', Config.batch_size, randomize=True, keys=input_keys)
-    dev_rank_batcher = StreamBatcher(Config.dataset, 'dev_ranking', Config.batch_size, randomize=False, loader_threads=4, keys=input_keys)
-    test_rank_batcher = StreamBatcher(Config.dataset, 'test_ranking', Config.batch_size, randomize=False, loader_threads=4, keys=input_keys)
+    train_batcher = StreamBatcher(args.data, 'train', args.batch_size, randomize=True, keys=input_keys, loader_threads=args.loader_threads)
+    dev_rank_batcher = StreamBatcher(args.data, 'dev_ranking', args.test_batch_size, randomize=False, loader_threads=args.loader_threads, keys=input_keys)
+    test_rank_batcher = StreamBatcher(args.data, 'test_ranking', args.test_batch_size, randomize=False, loader_threads=args.loader_threads, keys=input_keys)
 
 
-    if Config.model_name is None:
-        model = ConvE(vocab['e1'].num_token, vocab['rel'].num_token)
-    elif Config.model_name == 'ConvE':
-        model = ConvE(vocab['e1'].num_token, vocab['rel'].num_token)
-    elif Config.model_name == 'DistMult':
-        model = DistMult(vocab['e1'].num_token, vocab['rel'].num_token)
-    elif Config.model_name == 'ComplEx':
-        model = Complex(vocab['e1'].num_token, vocab['rel'].num_token)
+    if args.model is None:
+        model = ConvE(args, vocab['e1'].num_token, vocab['rel'].num_token)
+    elif args.model == 'conve':
+        model = ConvE(args, vocab['e1'].num_token, vocab['rel'].num_token)
+    elif args.model == 'distmult':
+        model = DistMult(args, vocab['e1'].num_token, vocab['rel'].num_token)
+    elif args.model == 'complex':
+        model = Complex(args, vocab['e1'].num_token, vocab['rel'].num_token)
     else:
-        log.info('Unknown model: {0}', Config.model_name)
+        log.info('Unknown model: {0}', args.model)
         raise Exception("Unknown model!")
 
     train_batcher.at_batch_prepared_observers.insert(1,TargetIdx2MultiTarget(num_entities, 'e2_multi1', 'e2_multi1_binary'))
 
 
-    eta = ETAHook('train', print_every_x_batches=100)
+    eta = ETAHook('train', print_every_x_batches=args.log_interval)
     train_batcher.subscribe_to_events(eta)
     train_batcher.subscribe_to_start_of_epoch_event(eta)
-    train_batcher.subscribe_to_events(LossHook('train', print_every_x_batches=100))
+    train_batcher.subscribe_to_events(LossHook('train', print_every_x_batches=args.log_interval))
 
-    if Config.cuda:
-        model.cuda()
-    if load:
+    model.cuda()
+    if args.resume:
         model_params = torch.load(model_path)
         print(model)
         total_param_size = []
@@ -144,16 +129,16 @@ def main():
     print(params)
     print(np.sum(params))
 
-    opt = torch.optim.Adam(model.parameters(), lr=Config.learning_rate, weight_decay=Config.L2)
-    for epoch in range(epochs):
+    opt = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.l2)
+    for epoch in range(args.epochs):
         model.train()
         for i, str2var in enumerate(train_batcher):
             opt.zero_grad()
             e1 = str2var['e1']
             rel = str2var['rel']
             e2_multi = str2var['e2_multi1_binary'].float()
             # label smoothing
-            e2_multi = ((1.0-Config.label_smoothing_epsilon)*e2_multi) + (1.0/e2_multi.size(1))
+            e2_multi = ((1.0-args.label_smoothing)*e2_multi) + (1.0/e2_multi.size(1))
 
             pred = model.forward(e1, rel)
             loss = model.loss(pred, e2_multi)
@@ -168,11 +153,50 @@ def main():
 
         model.eval()
         with torch.no_grad():
-            ranking_and_hits(model, dev_rank_batcher, vocab, 'dev_evaluation')
-            if epoch % 3 == 0:
+            if epoch % 5 == 0 and epoch > 0:
+                ranking_and_hits(model, dev_rank_batcher, vocab, 'dev_evaluation')
+            if epoch % 5 == 0:
                 if epoch > 0:
                     ranking_and_hits(model, test_rank_batcher, vocab, 'test_evaluation')
 
 
 if __name__ == '__main__':
-    main()
+    parser = argparse.ArgumentParser(description='Link prediction for knowledge graphs')
+    parser.add_argument('--batch-size', type=int, default=128, help='input batch size for training (default: 128)')
+    parser.add_argument('--test-batch-size', type=int, default=128, help='input batch size for testing/validation (default: 128)')
+    parser.add_argument('--epochs', type=int, default=1000, help='number of epochs to train (default: 1000)')
+    parser.add_argument('--lr', type=float, default=0.003, help='learning rate (default: 0.003)')
+    parser.add_argument('--seed', type=int, default=17, metavar='S', help='random seed (default: 17)')
+    parser.add_argument('--log-interval', type=int, default=100, help='how many batches to wait before logging training status')
+    parser.add_argument('--data', type=str, default='FB15k-237', help='Dataset to use: {FB15k-237, YAGO3-10, WN18RR, umls, nations, kinship}, default: FB15k-237')
+    parser.add_argument('--l2', type=float, default=0.0, help='Weight decay value to use in the optimizer. Default: 0.0')
+    parser.add_argument('--model', type=str, default='conve', help='Choose from: {conve, distmult, complex}')
+    parser.add_argument('--embedding-dim', type=int, default=200, help='The embedding dimension (1D). Default: 200')
+    parser.add_argument('--embedding-shape1', type=int, default=20, help='The first dimension of the reshaped 2D embedding. The second dimension is infered. Default: 20')
+    parser.add_argument('--hidden-drop', type=float, default=0.3, help='Dropout for the hidden layer. Default: 0.3.')
+    parser.add_argument('--input-drop', type=float, default=0.2, help='Dropout for the input embeddings. Default: 0.2.')
+    parser.add_argument('--feat-drop', type=float, default=0.2, help='Dropout for the convolutional features. Default: 0.2.')
+    parser.add_argument('--lr-decay', type=float, default=0.995, help='Decay the learning rate by this factor every epoch. Default: 0.995')
+    parser.add_argument('--loader-threads', type=int, default=4, help='How many loader threads to use for the batch loaders. Default: 4')
+    parser.add_argument('--preprocess', action='store_true', help='Preprocess the dataset. Needs to be executed only once. Default: 4')
+    parser.add_argument('--resume', action='store_true', help='Resume a model.')
+    parser.add_argument('--use-bias', action='store_true', help='Use a bias in the convolutional layer. Default: True')
+    parser.add_argument('--label-smoothing', type=float, default=0.1, help='Label smoothing value to use. Default: 0.1')
+    parser.add_argument('--hidden-size', type=int, default=9728, help='The side of the hidden layer. The required size changes with the size of the embeddings. Default: 9728 (embedding size 200).')
+
+    args = parser.parse_args()
+
+
+
+    # parse console parameters and set global variables
+    Config.backend = 'pytorch'
+    Config.cuda = True
+    Config.embedding_dim = args.embedding_dim
+    #Logger.GLOBAL_LOG_LEVEL = LogLevel.DEBUG
+
+
+    model_name = '{2}_{0}_{1}'.format(args.input_drop, args.hidden_drop, args.model)
+    model_path = 'saved_models/{0}_{1}.model'.format(args.data, model_name)
+
+    torch.manual_seed(args.seed)
+    main(args, model_path)