-
Notifications
You must be signed in to change notification settings - Fork 674
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the Bug
My K8S Cluster: 3 GPU nodes, each with 1x A100 GPU, connected via standard TCP networking (no RDMA/InfiniBand)
I deployed a DynamoGraphDeployment with one prefill worker running on a single GPU in one node, and one decode worker distributed across two nodes using two GPUs.
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
name: vllm-disagg
spec:
services:
Frontend:
dynamoNamespace: vllm-disagg
componentType: frontend
replicas: 1
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.5.1
workingDir: /workspace/components/backends/vllm
command:
- /bin/sh
- -c
args:
- "python3 -m dynamo.frontend --http-port 8000"
decode:
dynamoNamespace: vllm-disagg
envFromSecret: hf-token-secret
componentType: worker
replicas: 1
multinode:
nodeCount: 2
resources:
limits:
gpu: "1"
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.5.1
workingDir: /workspace/components/backends/vllm
command:
- /bin/sh
- -c
args:
- "python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --tensor-parallel-size 2"
envs:
- name: NIXL_LOG_LEVEL
value: TRACE
- name: UCX_LOG_LEVEL
value: DEBUG
prefill:
dynamoNamespace: vllm-disagg
envFromSecret: hf-token-secret
componentType: worker
replicas: 1
resources:
limits:
gpu: "1"
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.5.1
workingDir: /workspace/components/backends/vllm
command:
- /bin/sh
- -c
args:
- "python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --is-prefill-worker"kubectl get pod -n demo
NAME READY STATUS RESTARTS AGE
vllm-disagg-0-decode-0-decode-ldr-5w4wf 0/1 CrashLoopBackOff 4 (93s ago) 16m
vllm-disagg-0-decode-0-decode-wkr-wm258 0/1 CrashLoopBackOff 4 (31s ago) 16m
vllm-disagg-0-frontend-tzdzc 1/1 Running 0 16m
vllm-disagg-0-prefill-vgd4b 1/1 Running 0 16mThe decode pods consistently crash with the NIXL_ERR_BACKEND error:
(EngineCore_0 pid=572) Process EngineCore_0:
(EngineCore_0 pid=572) Traceback (most recent call last):
(EngineCore_0 pid=572) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_0 pid=572) self.run()
(EngineCore_0 pid=572) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_0 pid=572) self._target(*self._args, **self._kwargs)
(EngineCore_0 pid=572) File "/opt/vllm/vllm/v1/engine/core.py", line 704, in run_engine_core
(EngineCore_0 pid=572) raise e
(EngineCore_0 pid=572) File "/opt/vllm/vllm/v1/engine/core.py", line 691, in run_engine_core
(EngineCore_0 pid=572) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_0 pid=572) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572) File "/opt/vllm/vllm/v1/engine/core.py", line 492, in __init__
(EngineCore_0 pid=572) super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_0 pid=572) File "/opt/vllm/vllm/v1/engine/core.py", line 89, in __init__
(EngineCore_0 pid=572) self._initialize_kv_caches(vllm_config)
(EngineCore_0 pid=572) File "/opt/vllm/vllm/v1/engine/core.py", line 211, in _initialize_kv_caches
(EngineCore_0 pid=572) self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_0 pid=572) File "/opt/vllm/vllm/v1/executor/abstract.py", line 64, in initialize_from_config
(EngineCore_0 pid=572) self.collective_rpc("initialize_from_config",
(EngineCore_0 pid=572) File "/opt/vllm/vllm/executor/executor_base.py", line 309, in collective_rpc
(EngineCore_0 pid=572) return self._run_workers(method, *args, **(kwargs or {}))
(EngineCore_0 pid=572) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572) File "/opt/vllm/vllm/executor/ray_distributed_executor.py", line 503, in _run_workers
(EngineCore_0 pid=572) ray_worker_outputs = ray.get(ray_worker_outputs)
(EngineCore_0 pid=572) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572) File "/opt/dynamo/venv/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
(EngineCore_0 pid=572) return fn(*args, **kwargs)
(EngineCore_0 pid=572) ^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572) File "/opt/dynamo/venv/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
(EngineCore_0 pid=572) return func(*args, **kwargs)
(EngineCore_0 pid=572) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572) File "/opt/dynamo/venv/lib/python3.12/site-packages/ray/_private/worker.py", line 2882, in get
(EngineCore_0 pid=572) values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(EngineCore_0 pid=572) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572) File "/opt/dynamo/venv/lib/python3.12/site-packages/ray/_private/worker.py", line 968, in get_objects
(EngineCore_0 pid=572) raise value.as_instanceof_cause()
(EngineCore_0 pid=572) ray.exceptions.RayTaskError(nixlBackendError): ray::RayWorkerWrapper.execute_method() (pid=228, ip=10.244.2.5, actor_id=bce58778792348ee69fe999901000000, repr=<vllm.executor.ray_utils.RayWorkerWrapper object at 0x7f8e47711160>)
(EngineCore_0 pid=572) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572) File "/opt/vllm/vllm/worker/worker_base.py", line 620, in execute_method
(EngineCore_0 pid=572) raise e
(EngineCore_0 pid=572) File "/opt/vllm/vllm/worker/worker_base.py", line 611, in execute_method
(EngineCore_0 pid=572) return run_method(self, method, args, kwargs)
(EngineCore_0 pid=572) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572) File "/opt/vllm/vllm/utils/__init__.py", line 3007, in run_method
(EngineCore_0 pid=572) return func(*args, **kwargs)
(EngineCore_0 pid=572) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572) File "/opt/vllm/vllm/worker/worker_base.py", line 598, in initialize_from_config
(EngineCore_0 pid=572) self.worker.initialize_from_config(kv_cache_config) # type: ignore
(EngineCore_0 pid=572) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=572) File "/opt/vllm/vllm/v1/worker/gpu_worker.py", line 297, in initialize_from_config
(EngineCore_0 pid=572) self.model_runner.initialize_kv_cache(kv_cache_config)
(EngineCore_0 pid=572) File "/opt/vllm/vllm/v1/worker/gpu_model_runner.py", line 3199, in initialize_kv_cache
(EngineCore_0 pid=572) get_kv_transfer_group().register_kv_caches(kv_caches)
(EngineCore_0 pid=572) File "/opt/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 195, in register_kv_caches
(EngineCore_0 pid=572) self.connector_worker.register_kv_caches(kv_caches)
(EngineCore_0 pid=572) File "/opt/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 812, in register_kv_caches
(EngineCore_0 pid=572) self.nixl_wrapper.register_memory(descs)
(EngineCore_0 pid=572) File "/opt/dynamo/venv/lib/python3.12/site-packages/nixl/_api.py", line 266, in register_memory
(EngineCore_0 pid=572) self.agent.registerMem(reg_descs, handle_list)
(EngineCore_0 pid=572) nixl._bindings.nixlBackendError: NIXL_ERR_BACKENDSteps to Reproduce
Install Dynamo with grove and Kai-scheduler enabled.
# 1. Set environment
export NAMESPACE=dynamo-system
# any version of Dynamo 0.3.2+ listed at https://github.com/ai-dynamo/dynamo/releases
export RELEASE_VERSION=0.5.1
# 2. Install CRDs
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default
# 3. Install Platform
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
--namespace ${NAMESPACE} --create-namespace \
--set "grove.enabled=true" --set "kai-scheduler.enabled=true"Deploy DynamoGraphDeployment using the YAML provided above.
Expected Behavior
The inference pods run successfully.
Actual Behavior
The decode pods keep crashing.
Environment
Dynamo: 0.5.1
Additional Context
Screenshots
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working