-
Notifications
You must be signed in to change notification settings - Fork 42
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Tariq Ibrahim <[email protected]>
- Loading branch information
Showing
1 changed file
with
53 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bcced06
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tariq1890 It doesn't seem like H100NVL (device id 0x2321) supports 1g.11g and 2g.22g. It supports 1g.12gb and 2g.24gb. I tested this with mig-parted 0.7.0. Nvidia driver version
535.183.01
As you can see the profiles do not get applied.
Where as applying 1g.12gb works
Also the mig-manager example is using the correct one
mig-parted/deployments/container/nvidia-mig-manager-example-hopper.yaml
Lines 248 to 254 in 03e07c4
bcced06
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @vishnukarthikl, thanks for bringing this to our notice. To provide context, I added that MIG config based on the configuration referenced in this doc.
You'll see a table under
The table below shows the supported profiles on the H100 94GB product (PCIe and SXM5).
and the mig configs specified are1g.11gb, 1g.22gb, 2g.22gb....
. Looks like the config that ended up working for you are the ones for H100 96 GB variants according to the MIG reference documentation. We will check with the MIG team and get back to you.bcced06
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @vishnukarthikl, can you share the output of
nvidia-smi mig -lgip
here?bcced06
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bcced06
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vishnukarthikl ! I have created a new PR to fix this: #101