How to support deploying two small models on a single GPU card for inference? #4426
Unanswered
RanchiZhao
asked this question in
Q&A
Replies: 1 comment
-
Any solution for thisl |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
How to support deploying two small models on a single GPU card for inference? For example, using the Ray framework. OOM always, even if I set gpu_memory_utilization to 0.3
Beta Was this translation helpful? Give feedback.
All reactions