Skip to content

Encounter illegal memory access in FlashInfer backend under contiguous layout #321

@lianghao208

Description

@lianghao208

GPU type: H20 * 8
FlashInfer version: 0.5.3
vLLM version: 0.11.1

env illegal memory access is encoutered
KVCACHED_CONTIGUOUS_LAYOUT=true KVCACHED_MIN_RESERVED_PAGES=32 KVCACHED_MAX_RESERVED_PAGES=64
KVCACHED_CONTIGUOUS_LAYOUT=true KVCACHED_MIN_RESERVED_PAGES=5 KVCACHED_MAX_RESERVED_PAGES=10
KVCACHED_CONTIGUOUS_LAYOUT=false KVCACHED_MIN_RESERVED_PAGES=32 KVCACHED_MAX_RESERVED_PAGES=64

It seems that only when the number reserved pages >= 17 and enable contiguous layout, the illegal memory access error will be encountered.

related error log:

TMA Desc Addr:   0x7ffdf52f9fc0
format         9
dim            3
gmem_address   0x7fef2db48a00
globalDim      (128,4,4,1,1)
globalStrides  (2,1024,256,0,0)
boxDim         (64,64,1,1,1)
elementStrides (1,1,1,1,1)
interleave     0
swizzle        3
l2Promotion    2
oobFill        0
Error: Failed to initialize the TMA descriptor 700
TMA Desc Addr:   0x7ffdf52f9fc0
format         9
dim            4
gmem_address   0x1f0000000000
globalDim      (128,64,1,49408,1)
globalStrides  (2,256,256,2064384,0)
boxDim         (64,64,1,1,1)
elementStrides (1,1,1,1,1)
interleave     0
swizzle        3
l2Promotion    2
oobFill        0
Error: Failed to initialize the TMA descriptor 700
TMA Desc Addr:   0x7ffdf52f9fc0
format         9
dim            4
gmem_address   0x1f0000004000
globalDim      (128,64,1,49408,1)
globalStrides  (2,256,256,2064384,0)
boxDim         (64,8,1,1,1)
elementStrides (1,1,1,1,1)
interleave     0
swizzle        3
l2Promotion    2
oobFill        0
Error: Failed to initialize the TMA descriptor 700
TMA Desc Addr:   0x7ffdf52f9fc0
format         9
dim            3
gmem_address   0x7fef2db47a00
globalDim      (128,4,4,1,1)
globalStrides  (2,1024,256,0,0)
boxDim         (64,64,1,1,1)
elementStrides (1,1,1,1,1)
interleave     0
swizzle        3
l2Promotion    2
oobFill        0
Error: Failed to initialize the TMA descriptor 700
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] WorkerProc hit an exception.
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] Traceback (most recent call last):
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 701, in worker_busy_loop
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output = func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 480, in execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output = self.model_runner.execute_model(scheduler_output, intermediate_tensors)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2719, in execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return _execute_model()
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     ^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return func(*args, **kwds)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2718, in _execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._execute_model(scheduler_output, intermediate_tensors)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2824, in _execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     model_output = self._model_forward(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                    ^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2700, in _model_forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.model(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 1164, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     model_output = self.model(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                    ^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 228, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.forward(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 844, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     hidden_states, residual, kv_states = layer(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                                          ^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 741, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     hidden_states, ori_kv_states = self.self_attn(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                                    ^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 313, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output, _ = self.o_proj(attn_output)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                 ^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1426, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output = tensor_model_parallel_all_reduce(output_parallel)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/communication_op.py", line 14, in tensor_model_parallel_all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return get_tp_group().all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 378, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return torch.ops.vllm.all_reduce(input_, group_name=self.unique_name)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._op(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 119, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return group._all_reduce_out_place(tensor)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 385, in _all_reduce_out_place
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.device_communicator.all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 154, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     out = ca_comm.custom_all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 279, in custom_all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.all_reduce(input, registered=False)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 258, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     ops.all_reduce(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/_custom_ops.py", line 2153, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     torch.ops._C_custom_ar.all_reduce(fa, inp, out, reg_buffer, reg_buffer_sz_bytes)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._op(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] torch.AcceleratorError: CUDA error: an illegal memory access was encountered
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] 
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] Traceback (most recent call last):
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 701, in worker_busy_loop
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output = func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 480, in execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output = self.model_runner.execute_model(scheduler_output, intermediate_tensors)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2719, in execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return _execute_model()
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     ^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return func(*args, **kwds)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2718, in _execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._execute_model(scheduler_output, intermediate_tensors)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2824, in _execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     model_output = self._model_forward(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                    ^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2700, in _model_forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.model(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 1164, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     model_output = self.model(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                    ^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 228, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.forward(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 844, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     hidden_states, residual, kv_states = layer(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                                          ^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 741, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     hidden_states, ori_kv_states = self.self_attn(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                                    ^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 313, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output, _ = self.o_proj(attn_output)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                 ^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1426, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output = tensor_model_parallel_all_reduce(output_parallel)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/communication_op.py", line 14, in tensor_model_parallel_all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return get_tp_group().all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 378, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return torch.ops.vllm.all_reduce(input_, group_name=self.unique_name)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._op(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 119, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return group._all_reduce_out_place(tensor)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 385, in _all_reduce_out_place
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.device_communicator.all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 154, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     out = ca_comm.custom_all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 279, in custom_all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.all_reduce(input, registered=False)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 258, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     ops.all_reduce(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/_custom_ops.py", line 2153, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     torch.ops._C_custom_ar.all_reduce(fa, inp, out, reg_buffer, reg_buffer_sz_bytes)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._op(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] torch.AcceleratorError: CUDA error: an illegal memory access was encountered
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] 
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions