[Bug] RuntimeError: nms_impl: implementation for device npu:0 not found. #3216

hujuntao123 · 2024-12-17T09:58:58Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmcv).

Environment

/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch_npu/utils/path_manager.py:82: UserWarning: Warning: The /usr/local/Ascend/ascend-toolkit/latest owner does not match the current user.
warnings.warn(f"Warning: The {path} owner does not match the current user.")
/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch_npu/utils/path_manager.py:82: UserWarning: Warning: The /usr/local/Ascend/ascend-toolkit/8.0.RC2/aarch64-linux/ascend_toolkit_install.info owner does not match the current user.
warnings.warn(f"Warning: The {path} owner does not match the current user.")
[W compiler_depend.ts:623] Warning: expandable_segments currently defaults to false. You can enable this feature by export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True. (function operator())
[W compiler_depend.ts:631] Warning: expandable_segments feature is not supportted and the possible cause is that driver and firmware packages do not match. (function operator())
OrderedDict([('sys.platform', 'linux'), ('Python', '3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:53:27) [GCC 9.4.0]'), ('CUDA available', False), ('MUSA available', False), ('numpy_random_seed', 2147483648), ('GCC', 'gcc (GCC) 7.3.0'), ('PyTorch', '2.1.0'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 10.2\n - C++ Version: 201703\n - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: NO AVX\n - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-10/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=open, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n'), ('TorchVision', '0.16.0'), ('OpenCV', '4.9.0'), ('MMEngine', '0.10.4'), ('MMCV', '2.2.0'), ('MMCV Compiler', 'GCC 7.3'), ('MMCV CUDA Compiler', 'not available')])

Reproduces the problem - code sample

python tools/train.py configs/faster_rcnn/faster-rcnn_r101-caffe_fpn_1x_coco.py

Reproduces the problem - command or script

python tools/train.py configs/faster_rcnn/faster-rcnn_r101-caffe_fpn_1x_coco.py

Reproduces the problem - error message

/home/ma-user/work/mmdetection/mmdet/models/task_modules/prior_generators/anchor_generator.py:470: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.)
valid_x[:valid_w] = 1
Traceback (most recent call last):
File "/home/ma-user/work/mmdetection/tools/train.py", line 121, in
main()
File "/home/ma-user/work/mmdetection/tools/train.py", line 117, in main
runner.train()
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/mmengine/runner/loops.py", line 96, in run
self.run_epoch()
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/mmengine/runner/loops.py", line 113, in run_epoch
self.run_iter(idx, data_batch)
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/mmengine/runner/loops.py", line 129, in run_iter
outputs = self.runner.model.train_step(
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/mmengine/model/base_model/base_model.py", line 114, in train_step
losses = self._run_forward(data, mode='loss') # type: ignore
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/mmengine/model/base_model/base_model.py", line 361, in _run_forward
results = self(**data, mode=mode)
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ma-user/work/mmdetection/mmdet/models/detectors/base.py", line 92, in forward
return self.loss(inputs, data_samples)
File "/home/ma-user/work/mmdetection/mmdet/models/detectors/two_stage.py", line 174, in loss
rpn_losses, rpn_results_list = self.rpn_head.loss_and_predict(
File "/home/ma-user/work/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 167, in loss_and_predict
predictions = self.predict_by_feat(
File "/home/ma-user/work/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 279, in predict_by_feat
results = self._predict_by_feat_single(
File "/home/ma-user/work/mmdetection/mmdet/models/dense_heads/rpn_head.py", line 233, in _predict_by_feat_single
return self._bbox_post_process(
File "/home/ma-user/work/mmdetection/mmdet/models/dense_heads/rpn_head.py", line 284, in bbox_post_process
det_bboxes, keep_idxs = batched_nms(bboxes, results.scores,
File "/home/ma-user/work/mmcv/mmcv/ops/nms.py", line 303, in batched_nms
dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg)
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/mmengine/utils/misc.py", line 395, in new_func
output = old_func(*args, **kwargs)
File "/home/ma-user/work/mmcv/mmcv/ops/nms.py", line 127, in nms
inds = NMSop.apply(boxes, scores, iou_threshold, offset, score_threshold,
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/autograd/function.py", line 539, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/ma-user/work/mmcv/mmcv/ops/nms.py", line 27, in forward
inds = ext_module.nms(
RuntimeError: nms_impl: implementation for device npu:0 not found.

+------------------------------------------------------------------------------------------------+
| npu-smi 23.0.rc3 Version: 23.0.rc3 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 4 910B3 | OK | 86.6 35 0 / 0 |
| 0 | 0000:81:00.0 | 0 0 / 0 4376 / 65536 |
+===========================+===============+====================================================+
| 5 910B3 | OK | 90.8 37 0 / 0 |
| 0 | 0000:41:00.0 | 0 0 / 0 4158 / 65536 |
+===========================+===============+====================================================+
| 6 910B3 | OK | 90.8 36 0 / 0 |
| 0 | 0000:82:00.0 | 0 0 / 0 35295/ 65536 |
+===========================+===============+====================================================+
| 7 910B3 | OK | 83.9 36 0 / 0 |
| 0 | 0000:42:00.0 | 0 0 / 0 37193/ 65536 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| 4 0 | 1330715 | python | 102 |
+===========================+===============+====================================================+
| No running processes found in NPU 5 |
+===========================+===============+====================================================+
| 6 0 | 263627 | python3 | 31026 |
+===========================+===============+====================================================+
| 7 0 | 263627 | python3 | 32924 |
+===========================+===============+====================================================+

[ERROR] 2024-12-17-17:35:34 (PID:1377445, Device:0, RankID:-1) ERR99999 UNKNOWN application exception

Additional information

No response

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] RuntimeError: nms_impl: implementation for device npu:0 not found. #3216

[Bug] RuntimeError: nms_impl: implementation for device npu:0 not found. #3216

hujuntao123 commented Dec 17, 2024

[Bug] RuntimeError: nms_impl: implementation for device npu:0 not found. #3216

[Bug] RuntimeError: nms_impl: implementation for device npu:0 not found. #3216

Comments

hujuntao123 commented Dec 17, 2024

Prerequisite

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information