Skip to content
This repository was archived by the owner on Apr 1, 2023. It is now read-only.
This repository was archived by the owner on Apr 1, 2023. It is now read-only.

How to handle the problem of empty gradients list? #2

@yuqing-liu-dut

Description

@yuqing-liu-dut

Hello everyone. When using tensorflow-gpu 1.4.0 to run this code, this problem has occurred and I don't know why. Could anyone help me? Thank you. The output of tensorflow is as below.

/data8T/liuyuqing/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
/data8T/liuyuqing/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Launching new train: 2018-04-15-20-05-40
paired file sketchy num: 2
paired file flickr num: 2
2018-04-16 04:05:42.564590: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2018-04-16 04:05:42.915202: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:17:00.0
totalMemory: 11.90GiB freeMemory: 2.35GiB
2018-04-16 04:05:43.220134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:65:00.0
totalMemory: 11.90GiB freeMemory: 7.94MiB
2018-04-16 04:05:43.220478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-04-16 04:05:43.220511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1
2018-04-16 04:05:43.220525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0: Y Y
2018-04-16 04:05:43.220535: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1: Y Y
2018-04-16 04:05:43.220548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN Xp, pci bus id: 0000:17:00.0, compute capability: 6.1)
2018-04-16 04:05:43.220557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: TITAN Xp, pci bus id: 0000:65:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: TITAN Xp, pci bus id: 0000:17:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: TITAN Xp, pci bus id: 0000:65:00.0, compute capability: 6.1
2018-04-16 04:05:43.228850: I tensorflow/core/common_runtime/direct_session.cc:299] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: TITAN Xp, pci bus id: 0000:17:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: TITAN Xp, pci bus id: 0000:65:00.0, compute capability: 6.1

Iteration starts from: 0
2018-04-16 04:05:44.727512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN Xp, pci bus id: 0000:17:00.0, compute capability: 6.1)
2018-04-16 04:05:44.727680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: TITAN Xp, pci bus id: 0000:65:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: TITAN Xp, pci bus id: 0000:17:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: TITAN Xp, pci bus id: 0000:65:00.0, compute capability: 6.1
2018-04-16 04:05:44.728818: I tensorflow/core/common_runtime/direct_session.cc:299] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: TITAN Xp, pci bus id: 0000:17:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: TITAN Xp, pci bus id: 0000:65:00.0, compute capability: 6.1

Traceback (most recent call last):
File "main_single.py", line 185, in
status, appendix = launch_training(**d_params)
File "main_single.py", line 100, in launch_training
status = train_module.train(**kwargs)
File "./src_single/train_single.py", line 142, in train
optimizer=optimizer)
File "./src_single/graph_single.py", line 199, in build_multi_tower_graph
global_norm_clipped=global_grad_norm_G_clipped, appendix='_G')
File "./src_single/graph_single.py", line 641, in optimize
clip_ops.global_norm(list(zip(*gradients))[0]))
IndexError: list index out of range

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions