What’s the recommended way to use vLLM openAI server for batch processing? #7639
Unanswered
ktrapeznikov
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I want to process a batch of requests. What is the recommended way?
I typically use multiple workers with ThreadpoolExectuor. I am wondering if there is a better way?
Beta Was this translation helpful? Give feedback.
All reactions