python MI_Proposed_CNNs_Architecture.py 执行错误 #10

pioneerRick · 2024-03-13T11:58:56Z

Traceback (most recent call last):
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(128, 4), b.shape=(512, 4), m=128, n=512, k=4
[[{{node Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul}} = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Train_Optimizer/gradients/Output_Layer/prediction/add_grad/tuple/control_dependency, Output_Layer/W_fc2/Variable/read)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "MI_Proposed_CNNs_Architecture.py", line 582, in
sess.run(train_step, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 0.50})
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(128, 4), b.shape=(512, 4), m=128, n=512, k=4
[[node Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul (defined at MI_Proposed_CNNs_Architecture.py:301) = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Train_Optimizer/gradients/Output_Layer/prediction/add_grad/tuple/control_dependency, Output_Layer/W_fc2/Variable/read)]]

Caused by op 'Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul', defined at:
File "MI_Proposed_CNNs_Architecture.py", line 301, in
train_step = tf.train.AdamOptimizer(1e-5).minimize(loss)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 400, in minimize
grad_loss=grad_loss)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 519, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 408, in _MaybeCompile
return grad_fn() # Exit early
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in
lambda: grad_fn(op, *out_grads))
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py", line 1130, in _MatMulGrad
grad_a = gen_math_ops.mat_mul(grad, b, transpose_b=True)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4560, in mat_mul
name=name)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

...which was originally created as op 'Output_Layer/prediction/MatMul', defined at:
File "MI_Proposed_CNNs_Architecture.py", line 290, in
prediction = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2057, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4560, in mat_mul
name=name)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(128, 4), b.shape=(512, 4), m=128, n=512, k=4
[[node Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul (defined at MI_Proposed_CNNs_Architecture.py:301) = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Train_Optimizer/gradients/Output_Layer/prediction/add_grad/tuple/control_dependency, Output_Layer/W_fc2/Variable/read)]]

作者您好,我是一名大三的学生，最近正在复现您的论文寻找灵感。但是当我运行(Under Python 3.6 Environment) $ python MI_Proposed_CNNs_Architecture.py 时候遇到了以上错误,我查询了很多资料都没有结果，目前最大的可能性是tensorflow版本和cuda版本不匹配,但是我不确定这是否正确。

我的电脑配置如下：
NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2
NVIDIA GeForce RTX 4090 显存24G

conda 环境如下: 这个conda 环境是运行在 python 3.6.13 下

absl-py 0.15.0
astor 0.8.1
certifi 2021.5.30
coverage 5.5
Cython 0.29.24
dataclasses 0.8
et-xmlfile 1.1.0
gast 0.5.3
grpcio 1.36.1
h5py 2.10.0
importlib-metadata 4.8.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
Markdown 3.3.4
mkl-fft 1.3.0
mkl-random 1.1.1
mkl-service 2.3.0
numpy 1.19.2
openpyxl 3.1.2
pandas 1.1.5
pip 20.0.2
protobuf 3.17.2
python-dateutil 2.9.0.post0
pytz 2024.1
scipy 1.5.2
setuptools 36.4.0
six 1.16.0
tensorboard 1.12.2
tensorflow 1.12.0
termcolor 1.1.0
typing-extensions 4.1.1
Werkzeug 2.0.3
wheel 0.37.1
xlrd 1.2.0
zipp 3.6.0

pioneerRick · 2024-03-13T12:00:09Z

当我运行将batch_size 调整到 64以上的时候遇到是那个错误

pioneerRick · 2024-03-13T12:01:26Z

但是当我将batch_size大小调整到16的时候我遇到的是一个关于显存不足的错误:
Traceback (most recent call last):
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[134487,32,32,20] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node Convolutional_1/h_conv1/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Input/Reshape_Data/Reshape, Convolutional_1/W_conv1/Variable/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node loss/Euclidean_Distance/Mean/_69}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_372_loss/Euclidean_Distance/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "MI_Proposed_CNNs_Architecture.py", line 584, in
train_acc, train_loss = sess.run([Global_Average_Accuracy, loss], feed_dict={x: train_data, y: train_labels, keep_prob: 1.0})
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[134487,32,32,20] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node Convolutional_1/h_conv1/Conv2D (defined at MI_Proposed_CNNs_Architecture.py:101) = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Input/Reshape_Data/Reshape, Convolutional_1/W_conv1/Variable/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node loss/Euclidean_Distance/Mean/_69}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_372_loss/Euclidean_Distance/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'Convolutional_1/h_conv1/Conv2D', defined at:
File "MI_Proposed_CNNs_Architecture.py", line 101, in
h_conv1 = tf.nn.conv2d(x_Reshape, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 957, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[134487,32,32,20] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node Convolutional_1/h_conv1/Conv2D (defined at MI_Proposed_CNNs_Architecture.py:101) = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Input/Reshape_Data/Reshape, Convolutional_1/W_conv1/Variable/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node loss/Euclidean_Distance/Mean/_69}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_372_loss/Euclidean_Distance/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info

pioneerRick · 2024-03-13T12:04:08Z

这或者也能成为您指导我的线索,由于我通过matlab仅仅生成了Excel .xlsx Files文件,所以我将您原本的 .csv 文件全部换成了.xlsx文件进行读取,但是我感觉这个无法构成我错误的原因,但如果这能启发您请给我回复

pioneerRick · 2024-03-13T12:13:18Z

其实还有一个可能的错误,您使用的是windows环境,而我使用的unbuntu16.0.4 ,请问这个是否也可能是我错误的原因呢?

pioneerRick · 2024-03-13T17:23:00Z

您好,在我更换了一个更大的显卡,Tesla V100-32GB
环境是cuda 12.2 ,python 3.6.3
tensorflow-gpu=1.13.1
解决了以上的两个错误,可以判断是因为显存大小的问题
但是当我解决了以上两个问题的时候我遇到了另外一个问题

我搜索到的结果的是输入的input太小，做卷积运算的时候input会越来越小，过于小会“无法卷”，报错。
目前的解决方案估计是更换神经网络每层的大小，请问这是否意味着一开始的层数和每层的输入和输出大小有误。

rongmengmeng · 2024-10-28T07:53:50Z

遇到的超过显存的问题可能是因为在源代码中，分别将整个训练集和测试集用于计算准确率和损失

train_acc, train_loss = sess.run([Global_Average_Accuracy, loss], feed_dict={x: train_data, y: train_labels, keep_prob: 1.0})

test_summary, test_acc, test_loss = sess.run([merged, Global_Average_Accuracy, loss], feed_dict={x: test_data, y: test_labels, keep_prob: 1.0})
如果分批次计算准确率和损失则不会报错。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python MI_Proposed_CNNs_Architecture.py 执行错误 #10

python MI_Proposed_CNNs_Architecture.py 执行错误 #10

pioneerRick commented Mar 13, 2024

pioneerRick commented Mar 13, 2024

pioneerRick commented Mar 13, 2024

pioneerRick commented Mar 13, 2024

pioneerRick commented Mar 13, 2024

pioneerRick commented Mar 13, 2024

rongmengmeng commented Oct 28, 2024

python MI_Proposed_CNNs_Architecture.py 执行错误 #10

python MI_Proposed_CNNs_Architecture.py 执行错误 #10

Comments

pioneerRick commented Mar 13, 2024

pioneerRick commented Mar 13, 2024

pioneerRick commented Mar 13, 2024

pioneerRick commented Mar 13, 2024

pioneerRick commented Mar 13, 2024

pioneerRick commented Mar 13, 2024

rongmengmeng commented Oct 28, 2024