-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong node capacity and allocatable when using MIG #637
Comments
@xhejtman this is controlled by the
So in your case, you do seem to have some GPUs with MIG disabled and others with enabled. Is that correct? Otherwise this would be a bug. |
I have both GPUs set into mig configuration:
|
@xhejtman could you provide the logs from the device plugin? |
In meantime, I checked that Kubernetes 1.27.8 is not a problem, I have different cluster with 23.6.1 operator and it works ok. |
Looking at the logs, we're only starting 2 GRPC servers:
meaning that the running instance of the plugin should only be exposing these as allocatable resources. Could you confirm that |
|
1. Quick Debug Information
2. Issue or feature description
When MIG is enabled, both MIG resource and
nvidia.com/gpu
resource are reported as allocatable:which means that both requests
nvidia.com/gpu
andnvidia.com/mig-1g.10gb
can land on the node, however, thenvidia.com/gpu
request fails to inject GPU.3. Steps to reproduce the issue
Enable MIG on A100 GPU.
This may be just a bug in Kubernetes, not the gpu operator itself.
The text was updated successfully, but these errors were encountered: