Skip to content

Commit

Permalink
Add check for INVALID_ARGUMENT in NvLink checks
Browse files Browse the repository at this point in the history
Unfortunately, the check to see if a link is active throws an error if
an invalid linkID is passed in (instead of simply saying that the link
is inactive). This causes problems since the newest nvml.h is for CUDA
11 (which has an NVML_NVLINK_MAX_LINKS of 12) and older versions had an
NVML_NVLINK_MAX_LINKS of 6.

This patch adds a check to see if the various calls that take a linkID
fail with INVALID_ARGUMENT, and if so, silently ignore the error. This
hould be OK since we are fairly confident all other arguments are valid.
It would have been nice to avoid this (somewhat hacky) solution though.

Signed-off-by: Kevin Klues <[email protected]>
  • Loading branch information
klueska committed Jul 17, 2020
1 parent 0e03b6a commit cb5a78f
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions bindings/go/nvml/bindings.go
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ func (h handle) deviceGetNvLinkState(link uint) (*uint, error) {
var isActive C.nvmlEnableState_t

r := C.nvmlDeviceGetNvLinkState(h.dev, C.uint(link), &isActive)
if r == C.NVML_ERROR_NOT_SUPPORTED {
if r == C.NVML_ERROR_NOT_SUPPORTED || r == C.NVML_ERROR_INVALID_ARGUMENT {
return nil, nil
}

Expand All @@ -330,7 +330,7 @@ func (h handle) deviceGetNvLinkRemotePciInfo(link uint) (*string, error) {
var pci C.nvmlPciInfo_t

r := C.nvmlDeviceGetNvLinkRemotePciInfo(h.dev, C.uint(link), &pci)
if r == C.NVML_ERROR_NOT_SUPPORTED {
if r == C.NVML_ERROR_NOT_SUPPORTED || r == C.NVML_ERROR_INVALID_ARGUMENT {
return nil, nil
}

Expand Down

0 comments on commit cb5a78f

Please sign in to comment.