Skip to content

[BUG]: Multi-node vLLM deployment fails with NIXL_ERR_BACKEND error #3724

@cr7258

Description

@cr7258

Describe the Bug

My K8S Cluster: 3 GPU nodes, each with 1x A100 GPU, connected via standard TCP networking (no RDMA/InfiniBand)

I deployed a DynamoGraphDeployment with one prefill worker running on a single GPU in one node, and one decode worker distributed across two nodes using two GPUs.

apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
  name: vllm-disagg
spec:
  services:
    Frontend:
      dynamoNamespace: vllm-disagg
      componentType: frontend
      replicas: 1
      extraPodSpec:
        mainContainer:
          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.5.1
          workingDir: /workspace/components/backends/vllm
          command:
            - /bin/sh
            - -c
          args:
            - "python3 -m dynamo.frontend --http-port 8000"
    decode:
      dynamoNamespace: vllm-disagg
      envFromSecret: hf-token-secret
      componentType: worker
      replicas: 1
      multinode:
        nodeCount: 2
      resources:
        limits:
          gpu: "1"
      extraPodSpec:
        mainContainer:
          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.5.1
          workingDir: /workspace/components/backends/vllm
          command:
            - /bin/sh
            - -c
          args:
            - "python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --tensor-parallel-size 2"
      envs:
        - name: NIXL_LOG_LEVEL
          value: TRACE
        - name: UCX_LOG_LEVEL
          value: DEBUG
    prefill:
      dynamoNamespace: vllm-disagg
      envFromSecret: hf-token-secret
      componentType: worker
      replicas: 1
      resources:
        limits:
          gpu: "1"
      extraPodSpec:
        mainContainer:
          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.5.1
          workingDir: /workspace/components/backends/vllm
          command:
            - /bin/sh
            - -c
          args:
            - "python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --is-prefill-worker"
kubectl get pod -n demo
NAME                                      READY   STATUS             RESTARTS      AGE
vllm-disagg-0-decode-0-decode-ldr-5w4wf   0/1     CrashLoopBackOff   4 (93s ago)   16m
vllm-disagg-0-decode-0-decode-wkr-wm258   0/1     CrashLoopBackOff   4 (31s ago)   16m
vllm-disagg-0-frontend-tzdzc              1/1     Running            0             16m
vllm-disagg-0-prefill-vgd4b               1/1     Running            0             16m

The decode pods consistently crash with the NIXL_ERR_BACKEND error:

(EngineCore_0 pid=572) Process EngineCore_0:
(EngineCore_0 pid=572) Traceback (most recent call last):
(EngineCore_0 pid=572)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_0 pid=572)     self.run()
(EngineCore_0 pid=572)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_0 pid=572)     self._target(*self._args, **self._kwargs)
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/v1/engine/core.py", line 704, in run_engine_core
(EngineCore_0 pid=572)     raise e
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/v1/engine/core.py", line 691, in run_engine_core
(EngineCore_0 pid=572)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_0 pid=572)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/v1/engine/core.py", line 492, in __init__
(EngineCore_0 pid=572)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/v1/engine/core.py", line 89, in __init__
(EngineCore_0 pid=572)     self._initialize_kv_caches(vllm_config)
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/v1/engine/core.py", line 211, in _initialize_kv_caches
(EngineCore_0 pid=572)     self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/v1/executor/abstract.py", line 64, in initialize_from_config
(EngineCore_0 pid=572)     self.collective_rpc("initialize_from_config",
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/executor/executor_base.py", line 309, in collective_rpc
(EngineCore_0 pid=572)     return self._run_workers(method, *args, **(kwargs or {}))
(EngineCore_0 pid=572)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/executor/ray_distributed_executor.py", line 503, in _run_workers
(EngineCore_0 pid=572)     ray_worker_outputs = ray.get(ray_worker_outputs)
(EngineCore_0 pid=572)                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572)   File "/opt/dynamo/venv/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
(EngineCore_0 pid=572)     return fn(*args, **kwargs)
(EngineCore_0 pid=572)            ^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572)   File "/opt/dynamo/venv/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
(EngineCore_0 pid=572)     return func(*args, **kwargs)
(EngineCore_0 pid=572)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572)   File "/opt/dynamo/venv/lib/python3.12/site-packages/ray/_private/worker.py", line 2882, in get
(EngineCore_0 pid=572)     values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(EngineCore_0 pid=572)                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572)   File "/opt/dynamo/venv/lib/python3.12/site-packages/ray/_private/worker.py", line 968, in get_objects
(EngineCore_0 pid=572)     raise value.as_instanceof_cause()
(EngineCore_0 pid=572) ray.exceptions.RayTaskError(nixlBackendError): ray::RayWorkerWrapper.execute_method() (pid=228, ip=10.244.2.5, actor_id=bce58778792348ee69fe999901000000, repr=<vllm.executor.ray_utils.RayWorkerWrapper object at 0x7f8e47711160>)
(EngineCore_0 pid=572)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/worker/worker_base.py", line 620, in execute_method
(EngineCore_0 pid=572)     raise e
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/worker/worker_base.py", line 611, in execute_method
(EngineCore_0 pid=572)     return run_method(self, method, args, kwargs)
(EngineCore_0 pid=572)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/utils/__init__.py", line 3007, in run_method
(EngineCore_0 pid=572)     return func(*args, **kwargs)
(EngineCore_0 pid=572)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/worker/worker_base.py", line 598, in initialize_from_config
(EngineCore_0 pid=572)     self.worker.initialize_from_config(kv_cache_config)  # type: ignore
(EngineCore_0 pid=572)     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/v1/worker/gpu_worker.py", line 297, in initialize_from_config
(EngineCore_0 pid=572)     self.model_runner.initialize_kv_cache(kv_cache_config)
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/v1/worker/gpu_model_runner.py", line 3199, in initialize_kv_cache
(EngineCore_0 pid=572)     get_kv_transfer_group().register_kv_caches(kv_caches)
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 195, in register_kv_caches
(EngineCore_0 pid=572)     self.connector_worker.register_kv_caches(kv_caches)
(EngineCore_0 pid=572)   File "/opt/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 812, in register_kv_caches
(EngineCore_0 pid=572)     self.nixl_wrapper.register_memory(descs)
(EngineCore_0 pid=572)   File "/opt/dynamo/venv/lib/python3.12/site-packages/nixl/_api.py", line 266, in register_memory
(EngineCore_0 pid=572)     self.agent.registerMem(reg_descs, handle_list)
(EngineCore_0 pid=572) nixl._bindings.nixlBackendError: NIXL_ERR_BACKEND

Steps to Reproduce

Install Dynamo with grove and Kai-scheduler enabled.

# 1. Set environment
export NAMESPACE=dynamo-system
# any version of Dynamo 0.3.2+ listed at https://github.com/ai-dynamo/dynamo/releases
export RELEASE_VERSION=0.5.1

# 2. Install CRDs
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default

# 3. Install Platform
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
--namespace ${NAMESPACE} --create-namespace \
--set "grove.enabled=true" --set "kai-scheduler.enabled=true"

Deploy DynamoGraphDeployment using the YAML provided above.

Expected Behavior

The inference pods run successfully.

Actual Behavior

The decode pods keep crashing.

Environment

Dynamo: 0.5.1

Additional Context

decode-pod-debug.log

Screenshots

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions