Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

最后一步进行模型训练和推理测试的时候出现Cannot open file output/model.pdmodel #1554

Open
champaignhgx opened this issue Jan 10, 2025 · 5 comments
Assignees

Comments

@champaignhgx
Copy link

训练完成后没有查找到模型权重文件,导致报错。
/usr/local/lib/python3.10/dist-packages/paddle/jit/dy2static/program_translator.py:770: UserWarning: full_graph=False don't support input_spec arguments. It will not produce any effect.
You can set full_graph=True, then you can assign input spec.

warnings.warn(
Traceback (most recent call last):
File "/mnt/nvme/yiqidata/paddle-npu/PaddleCustomDevice/backends/npu/build/tests/test_LeNet_MNIST.py", line 261, in
main(args)
File "/mnt/nvme/yiqidata/paddle-npu/PaddleCustomDevice/backends/npu/build/tests/test_LeNet_MNIST.py", line 209, in main
infer("output")
File "/mnt/nvme/yiqidata/paddle-npu/PaddleCustomDevice/backends/npu/build/tests/test_LeNet_MNIST.py", line 76, in infer
config = paddle_infer.Config(model_file, params_file)
RuntimeError: (NotFound) Cannot open file output/model.pdmodel, please confirm whether the file is normal.
[Hint: Expected paddle::inference::IsFileExists(prog_file_) == true, but received paddle::inference::IsFileExists(prog_file_):0 != true:1.] (at /paddle/paddle/fluid/inference/api/analysis_config.cc:117) 检查确实没有输出output/model.pdmodel文件。

@yongqiangma
Copy link
Collaborator

能描述一下当前的软硬件环境吗?

@champaignhgx
Copy link
Author

硬件:Atlas 800T A2 CANN版本:8.0.RC1 驱动:Version: 24.1.rc3; aarch64

@yongqiangma
Copy link
Collaborator

这个单测在910B上应该是没有问题的,你前面训练的过程是正常的是吧? 把完整的log发出来看下吧

@champaignhgx
Copy link
Author

I0114 13:45:47.044159 88978 init.cc:237] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.10/dist-packages/paddle_custom_device
I0114 13:45:47.052294 88978 init.cc:146] Try loading custom device libs from: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device]
I0114 13:45:47.781834 88978 custom_device_load.cc:52] Succeed in loading custom runtime in lib: /usr/local/lib/python3.10/dist-packages/paddle_custom_device/libpaddle-custom-npu.so
I0114 13:45:47.781901 88978 custom_device_load.cc:59] Skipped lib [/usr/local/lib/python3.10/dist-packages/paddle_custom_device/libpaddle-custom-npu.so]: no custom engine Plugin symbol in this lib.
I0114 13:45:47.785665 88978 custom_kernel.cc:63] Succeed in loading 358 custom kernel(s) from loaded lib(s), will be used like native ones.
I0114 13:45:47.785871 88978 init.cc:158] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device]
I0114 13:45:47.785915 88978 init.cc:243] CustomDevice: npu, visible devices count: 8
Epoch [1/2], Iter [01/14], reader_cost: 3.49352 s, batch_cost: 63.50156 s, ips: 64.50236 samples/s, eta: 0:29:38
Epoch [1/2], Iter [02/14], reader_cost: 1.74693 s, batch_cost: 31.75776 s, ips: 128.97636 samples/s, eta: 0:14:17
Epoch [1/2], Iter [03/14], reader_cost: 1.16466 s, batch_cost: 21.17573 s, ips: 193.42902 samples/s, eta: 0:09:10
Epoch [1/2], Iter [04/14], reader_cost: 0.87353 s, batch_cost: 15.88464 s, ips: 257.85919 samples/s, eta: 0:06:37
Epoch [1/2], Iter [05/14], reader_cost: 0.69885 s, batch_cost: 12.70998 s, ips: 322.26655 samples/s, eta: 0:05:05
Epoch [1/2], Iter [06/14], reader_cost: 0.58239 s, batch_cost: 10.59354 s, ips: 386.65068 samples/s, eta: 0:04:03
Epoch [1/2], Iter [07/14], reader_cost: 0.49921 s, batch_cost: 9.08176 s, ips: 451.01403 samples/s, eta: 0:03:19
Epoch [1/2], Iter [08/14], reader_cost: 0.43682 s, batch_cost: 7.94790 s, ips: 515.35625 samples/s, eta: 0:02:46
Epoch [1/2], Iter [09/14], reader_cost: 0.38830 s, batch_cost: 7.06600 s, ips: 579.67744 samples/s, eta: 0:02:21
Epoch [1/2], Iter [10/14], reader_cost: 0.34948 s, batch_cost: 6.36048 s, ips: 643.97625 samples/s, eta: 0:02:00
Epoch [1/2], Iter [11/14], reader_cost: 0.31772 s, batch_cost: 5.78325 s, ips: 708.25248 samples/s, eta: 0:01:44
Epoch [1/2], Iter [12/14], reader_cost: 0.29125 s, batch_cost: 5.30221 s, ips: 772.50749 samples/s, eta: 0:01:30
Epoch [1/2], Iter [13/14], reader_cost: 0.26886 s, batch_cost: 4.89520 s, ips: 836.73722 samples/s, eta: 0:01:18
Epoch [1/2], Iter [14/14], reader_cost: 0.24966 s, batch_cost: 4.54634 s, ips: 900.94480 samples/s, eta: 0:01:08
Epoch ID: 1, Epoch time: 63.79231 s, reader_cost: 3.49528 s, batch_cost: 63.64874 s, avg ips: 898.91708 samples/s

Eval - Epoch ID: 1, Top1 accurary:: 0.66211, Top5 accurary:: 0.96436
Epoch [2/2], Iter [01/14], reader_cost: 4.02568 s, batch_cost: 4.04410 s, ips: 1012.83451 samples/s, eta: 0:00:56
Epoch [2/2], Iter [02/14], reader_cost: 2.01295 s, batch_cost: 2.02827 s, ips: 2019.45761 samples/s, eta: 0:00:26
Epoch [2/2], Iter [03/14], reader_cost: 1.34202 s, batch_cost: 1.35618 s, ips: 3020.25523 samples/s, eta: 0:00:16
Epoch [2/2], Iter [04/14], reader_cost: 1.00655 s, batch_cost: 1.02020 s, ips: 4014.91634 samples/s, eta: 0:00:11
Epoch [2/2], Iter [05/14], reader_cost: 0.80527 s, batch_cost: 0.81852 s, ips: 5004.17703 samples/s, eta: 0:00:08
Epoch [2/2], Iter [06/14], reader_cost: 0.67108 s, batch_cost: 0.68405 s, ips: 5987.87435 samples/s, eta: 0:00:06
Epoch [2/2], Iter [07/14], reader_cost: 0.57523 s, batch_cost: 0.58804 s, ips: 6965.56527 samples/s, eta: 0:00:04
Epoch [2/2], Iter [08/14], reader_cost: 0.50334 s, batch_cost: 0.51593 s, ips: 7939.12025 samples/s, eta: 0:00:03
Epoch [2/2], Iter [09/14], reader_cost: 0.44743 s, batch_cost: 0.45983 s, ips: 8907.61785 samples/s, eta: 0:00:02
Epoch [2/2], Iter [10/14], reader_cost: 0.40270 s, batch_cost: 0.41496 s, ips: 9870.90572 samples/s, eta: 0:00:02
Epoch [2/2], Iter [11/14], reader_cost: 0.36610 s, batch_cost: 0.37823 s, ips: 10829.39487 samples/s, eta: 0:00:01
Epoch [2/2], Iter [12/14], reader_cost: 0.33560 s, batch_cost: 0.34763 s, ips: 11782.63694 samples/s, eta: 0:00:01
Epoch [2/2], Iter [13/14], reader_cost: 0.30979 s, batch_cost: 0.32176 s, ips: 12730.07746 samples/s, eta: 0:00:00
Epoch [2/2], Iter [14/14], reader_cost: 0.28767 s, batch_cost: 0.29958 s, ips: 13672.36621 samples/s, eta: 0:00:00
Epoch ID: 2, Epoch time: 4.71484 s, reader_cost: 4.02744 s, batch_cost: 4.19415 s, avg ips: 12162.46063 samples/s
/usr/local/lib/python3.10/dist-packages/paddle/jit/dy2static/program_translator.py:770: UserWarning: full_graph=False don't support input_spec arguments. It will not produce any effect.
You can set full_graph=True, then you can assign input spec.

warnings.warn(
Eval - Epoch ID: 2, Top1 accurary:: 0.84839, Top5 accurary:: 0.99023
Traceback (most recent call last):
File "/mnt/nvme/paddle-npu/PaddleCustomDevice/backends/npu/build/tests/test_LeNet_MNIST.py", line 261, in
main(args)
File "/mnt/nvme/paddle-npu/PaddleCustomDevice/backends/npu/build/tests/test_LeNet_MNIST.py", line 209, in main
infer("output")
File "/mnt/nvme/paddle-npu/PaddleCustomDevice/backends/npu/build/tests/test_LeNet_MNIST.py", line 76, in infer
config = paddle_infer.Config(model_file, params_file)
RuntimeError: (NotFound) Cannot open file output/model.pdmodel, please confirm whether the file is normal.
[Hint: Expected paddle::inference::IsFileExists(prog_file_) == true, but received paddle::inference::IsFileExists(prog_file_):0 != true:1.] (at /paddle/paddle/fluid/inference/api/analysis_config.cc:117)

@yongqiangma
Copy link
Collaborator

yongqiangma commented Jan 15, 2025

pir有相应升级,加上如下变量进行尝试,稍后我更新相关文档

export FLAGS_enable_pir_api=False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants