Recommended setting for running vLLM for CPU #5672
Unanswered
jerin-scalers-ai
asked this question in
Q&A
Replies: 1 comment
-
48 cores per instance would do fine, It's performing with almost 10 t/s throughput for single user. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What are the recommended settings for running vLLM on a CPU to achieve high performance? For instance, if I have a dual-socket server with 96 cores per socket, how many cores (--cpuset-cpus) should be allocated to run multiple replicas of vLLM?
Beta Was this translation helpful? Give feedback.
All reactions