If I comment out the launch kernel wrap API and rebuild the so, the vllm server can run normally. The issue is hide in cuLaunchKernel_WRAPPER
[rank0]:[W205 17:32:56.335214611 Module.cpp:193] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
[rank0]:[W205 17:32:56.500983390 Module.cpp:193] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%| | 0/51 [00:00<?, ?it/s]
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] EngineCore failed to start.
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] Traceback (most recent call last):
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/compilation/cuda_graph.py", line 275, in __call_
_
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] output = self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/compilation/piecewise_backend.py", line 285, in
__call__
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return range_entry.runnable(*args)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/compilation/compiler_interface.py", line 301, in
compiled_graph_wrapper
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] graph_output = inductor_compiled_graph(*args)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/_inductor/standalone_compile.py", line 63, in _
_call__
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return self._compiled_fn(*args)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/_inductor/standalone_compile.py", line 184, in
<lambda>
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return CompiledArtifact(lambda *args: compiled_fn(list(args)), None)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py",
line 353, in runtime_wrapper
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] all_outs = call_func_at_runtime_with_args(
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 129, i
n call_func_at_runtime_with_args
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] out = normalize_as_list(f(args))
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py",
line 526, in wrapper
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return compiled_fn(runtime_args)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 613, in __call_
_
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return self.current_callable(inputs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3017, in run
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] out = model(new_inputs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/tmp/torchinductor_root/4y/c4ypfr3pjzwrsrlztf36ycj6joqfifh3i3wi4cf5omjplowq4b2v.py", line 1150, in call
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] triton_poi_fused_mul_silu_slice_1.run(buf3, buf4, triton_poi_fused_mul_silu_slice_1_xnumel, stream=stream0)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 1
310, in run
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return launcher(
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "<string>", line 5, in launcher
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/_inductor/runtime/static_cuda_launcher.py", lin
e 244, in run
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] _StaticCudaLauncher._launch_kernel(
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] RuntimeError: CUDA driver error: operation failed due to a previous error during capture
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] Exception raised from launchKernel at /pytorch/torch/csrc/inductor/static_cuda_launcher.cpp:155 (most recent call first):
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] C++ CapturedTraceback:
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::alloc
tor<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_d
ata const&) from Logging.cpp:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) fr
om ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char
>, std::allocator<char> > const&) from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #7 (anonymous namespace)::launchKernel(CUfunc_st*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, void
**, CUstream_st*) from static_cuda_launcher.cpp:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #8 (anonymous namespace)::launch_kernel(_object*, _object*) from static_cuda_launcher.cpp:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #9 PyObject_CallFunctionObjArgs from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #10 _PyObject_MakeTpCall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #11 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #12 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #13 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #14 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #15 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #16 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #17 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #18 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #19 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #20 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #21 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #22 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #23 _PyObject_FastCallDictTstate from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #24 _PyObject_Call_Prepend from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #25 PyInit__datetime from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #26 _PyObject_MakeTpCall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #27 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #28 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #29 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #30 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #31 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #32 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #33 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #34 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #35 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #36 _PyObject_FastCallDictTstate from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #37 _PyObject_Call_Prepend from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #38 PyInit__datetime from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #39 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #40 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #41 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #42 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #43 _PyObject_FastCallDictTstate from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #44 _PyObject_Call_Prepend from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #45 PyInit__datetime from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #46 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #47 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #48 _PyObject_FastCallDictTstate from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #49 _PyObject_Call_Prepend from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #50 PyInit__datetime from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #51 _PyObject_MakeTpCall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #52 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #53 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #54 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #55 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #56 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #57 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #58 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #59 _PyObject_FastCallDictTstate from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #60 _PyObject_Call_Prepend from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #61 PyInit__datetime from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #62 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #63 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #64 _PyObject_FastCallDictTstate from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #65 _PyObject_Call_Prepend from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #66 PyInit__datetime from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #67 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #68 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #69 _PyObject_FastCallDictTstate from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #70 _PyObject_Call_Prepend from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #71 PyInit__datetime from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #72 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #73 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #74 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #75 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #76 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #77 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #78 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #79 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #80 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #81 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #82 _PyObject_FastCallDictTstate from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #83 _PyObject_Call_Prepend from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #84 PyInit__datetime from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #85 _PyObject_MakeTpCall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #86 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #87 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #88 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #89 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #90 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #91 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #92 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #93 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #94 _PyObject_FastCallDictTstate from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #95 _PyObject_Call_Prepend from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #96 PyInit__datetime from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #97 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #98 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #99 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #100 _PyObject_FastCallDictTstate from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #101 _PyObject_Call_Prepend from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #102 PyInit__datetime from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #103 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #104 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #105 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #106 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #107 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #108 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #109 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #110 PyInit_gc from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #111 _PyObject_MakeTpCall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #112 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #113 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #114 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #115 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #116 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #117 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #118 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #119 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #120 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #121 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #122 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #123 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #124 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #125 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #126 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #127 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #128 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #129 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #130 _PyObject_FastCallDictTstate from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #131 _PyStack_AsDict from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #132 _PyObject_MakeTpCall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #133 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #134 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #135 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #136 PyObject_Call from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #137 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #138 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #139 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #140 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #141 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #142 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #143 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #144 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #145 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #146 PyEval_EvalCode from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #147 PyEval_EvalCode from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #148 PyUnicode_Tailmatch from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #149 PyInit__collections from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #150 PyRun_StringFlags from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #151 PyRun_SimpleStringFlags from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #152 Py_RunMain from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #153 Py_BytesMain from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #154 __libc_init_first from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #155 __libc_start_main from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #156 _start from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946]
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946]
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] During handling of the above exception, another exception occurred:
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946]
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] Traceback (most recent call last):
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 691, in __init__
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] super().__init__(
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 269, in _initialize_kv_
caches
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 116, in initializ
e_from_config
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] self.collective_rpc("compile_or_warm_up_model")
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in co
llective_rpc
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/v1/serial_utils.py", line 461, in run_method
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return func(*args, **kwargs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 452, in compile_o
r_warm_up_model
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] cuda_graph_memory_bytes = self.model_runner.capture_model()
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5028, in ca
pture_model
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] self._capture_cudagraphs(
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5128, in _c
apture_cudagraphs
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] dummy_run(
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_co
ntext
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return func(*args, **kwargs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4685, in _d
ummy_run
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] outputs = self.model(
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/compilation/cuda_graph.py", line 222, in __call_
_
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_c
all_impl
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/model_executor/models/qwen3.py", line 314, in fo
rward
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] hidden_states = self.model(
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/compilation/decorators.py", line 472, in __call_
_
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs) # type: ignore[arg-type]
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/compilation/wrapper.py", line 233, in __call__
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return self._call_with_optional_nvtx_range(
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/compilation/wrapper.py", line 119, in _call_with
_optional_nvtx_range
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return callable_fn(*args, **kwargs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 418, in fo
rward
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] def forward(
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return fn(*args, **kwargs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/compilation/caching.py", line 185, in __call__
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return self.optimized_call(*args, **kwargs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return self._wrapped_call(self, *args, **kwargs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/fx/graph_module.py", line 413, in __call__
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] raise e
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/fx/graph_module.py", line 400, in __call__
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_c
all_impl
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return self._call_impl(*args, **kwargs) [522/1988]
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "<eval_with_key>.74", line 386, in forward
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] submod_22 = self.submod_22(getitem_63, s72, l_self_modules_layers_modules_10_modules_self_attn_modules_o_proj_parameters_
weight_, l_self_modules_layers_modules_10_modules_post_attention_layernorm_parameters_weight_, getitem_65, l_self_modules_layers_modules_10_modules_mlp_modules_gate_up_proj_parameters_wei
ght_, l_self_modules_layers_modules_10_modules_mlp_modules_down_proj_parameters_weight_, l_self_modules_layers_modules_11_modules_input_layernorm_parameters_weight_, l_self_modules_layers
_modules_11_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_11_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_11_m
odules_self_attn_modules_k_norm_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); getitem_63 = l_self_module
s_layers_modules_10_modules_self_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_10_modules_post_attention_layernorm_parameters_weight_ = getitem_65 = l_self_module
s_layers_modules_10_modules_mlp_modules_gate_up_proj_parameters_weight_ = l_self_modules_layers_modules_10_modules_mlp_modules_down_proj_parameters_weight_ = l_self_modules_layers_modules
_11_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_11_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_11_modules_self_att
n_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_11_modules_self_attn_modules_k_norm_parameters_weight_ = None
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/vllm/compilation/cuda_graph.py", line 269, in __call_
_
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] with torch.cuda.graph(
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/cuda/graphs.py", line 265, in __exit__
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] self.cuda_graph.capture_end()
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] File "/testdir/gpu_mem/GVM/vllm/lib/python3.10/site-packages/torch/cuda/graphs.py", line 128, in capture_end
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] super().capture_end()
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] torch.AcceleratorError: CUDA error: operation failed due to a previous error during capture
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] Search for `cudaErrorStreamCaptureInvalidated' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for
more information.
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946]
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first):
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] C++ CapturedTraceback:
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::alloca
tor<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_d
ata const&) from Logging.cpp:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) fr
om ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #6 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) [clone .cold] from CUDAException.cpp:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #7 at::cuda::CUDAGraph::capture_end() from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #8 torch::detail::wrap_pybind_function_impl_<void (at::cuda::CUDAGraph::*)(), 0ul, true>(void (at::cuda::CUDAGraph::*&&)(), s
td::integer_sequence<unsigned long, 0ul>, std::integral_constant<bool, true>)::{lambda(at::cuda::CUDAGraph&)#1}::operator()(at::cuda::CUDAGraph&) const from :0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #9 pybind11::cpp_function::initialize<torch::detail::wrap_pybind_function_impl_<void (at::cuda::CUDAGraph::*)(), 0ul, true>(v
oid (at::cuda::CUDAGraph::*&&)(), std::integer_sequence<unsigned long, 0ul>, std::integral_constant<bool, true>)::{lambda(at::cuda::CUDAGraph&)#1}, void, at::cuda::CUDAGraph&, pybind11::n
ame, pybind11::is_method, pybind11::sibling>(void (at::cuda::CUDAGraph::*&&)(), void (*)(at::cuda::CUDAGraph&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&
)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from :0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #10 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #11 PyObject_CallFunctionObjArgs from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #12 _PyObject_MakeTpCall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #13 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #14 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #15 _PyFunction_Vectorcall from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #16 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #17 PyMethod_New from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #18 _Py_VaBuildStack from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #19 _PyEval_EvalFrameDefault from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #20 _PyObject_FastCallDictTstate from ??:0
(EngineCore_DP0 pid=51836) ERROR 02-05 17:32:56 [core.py:946] #21 _PyObject_Call_Prepend from ??:0
.....
test env
WorkAround
If I comment out the launch kernel wrap API and rebuild the so, the vllm server can run normally. The issue is hide in
cuLaunchKernel_WRAPPERtest log