You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i am using an NVIDIA A100 80GB MIG 3g.40gb slice to deploy microsoft/Phi-3-medium-128k-instruct (~26gb) using vllm. However, i keep running into OOM issues. here is how i am initializing the model:
engine_args = AsyncEngineArgs(
model="microsoft/Phi-3-medium-128k-instruct",
gpu_memory_utilization=0.8,
dtype=torch.float16,
enforce_eager=True,
trust_remote_code=True
)
loaded_llm = AsyncLLMEngine.from_engine_args(engine_args)
and this is the error:
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":844, please report a bug to PyTorch.
any suggestions on what parameters to tweak to make this model fit in my 40g mig slice?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
i am using an NVIDIA A100 80GB MIG 3g.40gb slice to deploy microsoft/Phi-3-medium-128k-instruct (~26gb) using vllm. However, i keep running into OOM issues. here is how i am initializing the model:
engine_args = AsyncEngineArgs(
model="microsoft/Phi-3-medium-128k-instruct",
gpu_memory_utilization=0.8,
dtype=torch.float16,
enforce_eager=True,
trust_remote_code=True
)
loaded_llm = AsyncLLMEngine.from_engine_args(engine_args)
and this is the error:
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":844, please report a bug to PyTorch.
any suggestions on what parameters to tweak to make this model fit in my 40g mig slice?
Beta Was this translation helpful? Give feedback.
All reactions