-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python MI_Proposed_CNNs_Architecture.py 执行错误 #10
Comments
但是当我将batch_size大小调整到16的时候我遇到的是一个关于显存不足的错误:
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. During handling of the above exception, another exception occurred: Traceback (most recent call last):
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. Caused by op 'Convolutional_1/h_conv1/Conv2D', defined at: ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[134487,32,32,20] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info |
其实还有一个可能的错误,您使用的是windows环境,而我使用的unbuntu16.0.4 ,请问这个是否也可能是我错误的原因呢? |
遇到的超过显存的问题可能是因为在源代码中,分别将整个训练集和测试集用于计算准确率和损失
|
Traceback (most recent call last):
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(128, 4), b.shape=(512, 4), m=128, n=512, k=4
[[{{node Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul}} = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Train_Optimizer/gradients/Output_Layer/prediction/add_grad/tuple/control_dependency, Output_Layer/W_fc2/Variable/read)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "MI_Proposed_CNNs_Architecture.py", line 582, in
sess.run(train_step, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 0.50})
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(128, 4), b.shape=(512, 4), m=128, n=512, k=4
[[node Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul (defined at MI_Proposed_CNNs_Architecture.py:301) = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Train_Optimizer/gradients/Output_Layer/prediction/add_grad/tuple/control_dependency, Output_Layer/W_fc2/Variable/read)]]
Caused by op 'Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul', defined at:
File "MI_Proposed_CNNs_Architecture.py", line 301, in
train_step = tf.train.AdamOptimizer(1e-5).minimize(loss)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 400, in minimize
grad_loss=grad_loss)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 519, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 408, in _MaybeCompile
return grad_fn() # Exit early
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in
lambda: grad_fn(op, *out_grads))
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py", line 1130, in _MatMulGrad
grad_a = gen_math_ops.mat_mul(grad, b, transpose_b=True)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4560, in mat_mul
name=name)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()
...which was originally created as op 'Output_Layer/prediction/MatMul', defined at:
File "MI_Proposed_CNNs_Architecture.py", line 290, in
prediction = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2057, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4560, in mat_mul
name=name)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()
InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(128, 4), b.shape=(512, 4), m=128, n=512, k=4
[[node Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul (defined at MI_Proposed_CNNs_Architecture.py:301) = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Train_Optimizer/gradients/Output_Layer/prediction/add_grad/tuple/control_dependency, Output_Layer/W_fc2/Variable/read)]]
作者您好,我是一名大三的学生,最近正在复现您的论文寻找灵感。但是当我运行(Under Python 3.6 Environment) $ python MI_Proposed_CNNs_Architecture.py 时候遇到了以上错误,我查询了很多资料都没有结果,目前最大的可能性是tensorflow版本和cuda版本不匹配,但是我不确定这是否正确。
我的电脑配置如下:
NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2
NVIDIA GeForce RTX 4090 显存24G
conda 环境如下: 这个conda 环境是运行在 python 3.6.13 下
absl-py 0.15.0
astor 0.8.1
certifi 2021.5.30
coverage 5.5
Cython 0.29.24
dataclasses 0.8
et-xmlfile 1.1.0
gast 0.5.3
grpcio 1.36.1
h5py 2.10.0
importlib-metadata 4.8.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
Markdown 3.3.4
mkl-fft 1.3.0
mkl-random 1.1.1
mkl-service 2.3.0
numpy 1.19.2
openpyxl 3.1.2
pandas 1.1.5
pip 20.0.2
protobuf 3.17.2
python-dateutil 2.9.0.post0
pytz 2024.1
scipy 1.5.2
setuptools 36.4.0
six 1.16.0
tensorboard 1.12.2
tensorflow 1.12.0
termcolor 1.1.0
typing-extensions 4.1.1
Werkzeug 2.0.3
wheel 0.37.1
xlrd 1.2.0
zipp 3.6.0
The text was updated successfully, but these errors were encountered: