Using `args` for Distributed Inference with Ray in Kubernetes Deployment #11460

Zerohertz · 2024-12-24T08:27:40Z

Zerohertz
Dec 24, 2024

I am currently developing a distributed inference environment using vLLM and Kubernetes.
In the example below, I am using the --block argument.

vllm/examples/run_cluster.sh

Line 33 in a491d6f

RAY_START_CMD="ray start --block"

However, when I use it in Kubernetes, only the node connections are established, but the vllm serve command does not execute afterward.

args:
  - "-c"
  - "ray start --block --head --port=6379 && vllm serve llama3 --tensor-parallel-size 2 --pipeline-parallel-size 2 --distributed-executor-backend ray --dtype=half"

On the other hand, if I don't use --block, the connections between nodes are not established securely.

Could anyone advise on which arguments are best to use in this deployment environment? Any guidance would be greatly appreciated!

Zerohertz · 2024-12-24T08:55:31Z

Zerohertz
Dec 24, 2024
Author

I'm trying the following bash script.

          command:
            - "/bin/bash"
          args:
            - "-c"
            - |
              ray start --head --port=6379
              while true; do
                node_count=$(ray status | grep node_ | wc -l)
                echo "Current node count: $node_count"
                if [ "$node_count" -eq 2 ]; then
                  echo "Node count is 2. Exiting loop."
                  break
                fi
                sleep 1
              done
              vllm serve llama3 --tensor-parallel-size 2 --pipeline-parallel-size 2 --distributed-executor-backend ray --dtype=half

Current node count: 0                                                                                                                                                         
Current node count: 1                                                                                                                                                         
Current node count: 1                                                                                                                                                         
Current node count: 2                                                                                                                                                         
Node count is 2. Exiting loop.                                                                                                                                                
INFO 12-24 00:54:06 api_server.py:585] vLLM API server version 0.6.4                                                                                                          
INFO 12-24 00:54:06 api_server.py:586] args: Namespace(subparser='serve', mode

0 replies

torsteinelv · 2025-01-13T12:09:01Z

torsteinelv
Jan 13, 2025

Found a solution for it. And have shared it here:
#11957

But I have some other issues but 😊

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `args` for Distributed Inference with Ray in Kubernetes Deployment #11460

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Using args for Distributed Inference with Ray in Kubernetes Deployment #11460

Zerohertz Dec 24, 2024

Replies: 2 comments

Zerohertz Dec 24, 2024 Author

torsteinelv Jan 13, 2025

Using `args` for Distributed Inference with Ray in Kubernetes Deployment #11460

Zerohertz
Dec 24, 2024

Zerohertz
Dec 24, 2024
Author

torsteinelv
Jan 13, 2025