Replies: 2 comments
-
I'm trying the following bash script. command:
- "/bin/bash"
args:
- "-c"
- |
ray start --head --port=6379
while true; do
node_count=$(ray status | grep node_ | wc -l)
echo "Current node count: $node_count"
if [ "$node_count" -eq 2 ]; then
echo "Node count is 2. Exiting loop."
break
fi
sleep 1
done
vllm serve llama3 --tensor-parallel-size 2 --pipeline-parallel-size 2 --distributed-executor-backend ray --dtype=half Current node count: 0
Current node count: 1
Current node count: 1
Current node count: 2
Node count is 2. Exiting loop.
INFO 12-24 00:54:06 api_server.py:585] vLLM API server version 0.6.4
INFO 12-24 00:54:06 api_server.py:586] args: Namespace(subparser='serve', mode |
Beta Was this translation helpful? Give feedback.
0 replies
-
Found a solution for it. And have shared it here: But I have some other issues but 😊 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am currently developing a distributed inference environment using vLLM and Kubernetes.
In the example below, I am using the
--block
argument.vllm/examples/run_cluster.sh
Line 33 in a491d6f
However, when I use it in Kubernetes, only the node connections are established, but the vllm serve command does not execute afterward.
On the other hand, if I don't use
--block
, the connections between nodes are not established securely.Could anyone advise on which arguments are best to use in this deployment environment? Any guidance would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions