Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown CUDA device compute capability: 8.7 #77

Open
mstksg opened this issue Mar 1, 2024 · 1 comment
Open

Unknown CUDA device compute capability: 8.7 #77

mstksg opened this issue Mar 1, 2024 · 1 comment

Comments

@mstksg
Copy link

mstksg commented Mar 1, 2024

Thanks for the great project :)

Encountered this while running on a jetson AGX orin device (sm_87 arch), from looking at the source code a bit it seems like each capability needs to be explicitly accounted for?

$ nvidia-device-query
CUDA device query (Driver API, statically linked)
CUDA driver version 11.4
CUDA API version 11.4
Detected 1 CUDA capable device

Device 0: Orin
*** Warning: Unknown CUDA device compute capability: 8.7
*** Please submit a bug report at https://github.com/tmcdonell/cuda/issues

  CUDA capability:                          8.7
  CUDA cores:                               1024 cores in 16 multiprocessors (64 cores/MP)
  Global memory:                            61 GB
  Constant memory:                          64 kB
  Shared memory per block:                  48 kB
  Registers per block:                      65536
  Warp size:                                32
  Maximum threads per multiprocessor:       1536
  Maximum threads per block:                1024
  Maximum grid dimensions:                  2147483647 x 65535 x 65535
  Maximum block dimensions:                 1024 x 1024 x 64
  GPU clock rate:                           1.3 GHz
  Memory clock rate:                        1.3 GHz
  Memory bus width:                         128-bit
  L2 cache size:                            4 MB
  Maximum texture dimensions
    1D:                                     131072
    2D:                                     131072 x 65536
    3D:                                     16384 x 16384 x 16384
  Texture alignment:                        512 B
  Maximum memory pitch:                     2 GB
  Concurrent kernel execution:              Yes
  Concurrent copy and execution:            Yes, with 2 copy engines
  Runtime limit on kernel execution:        No
  Integrated GPU sharing host memory:       Yes
  Host page-locked memory mapping:          Yes
  ECC memory support:                       No
  Unified addressing (UVA):                 Yes
  Single to double precision performance:   32 : 1
  Supports compute pre-emption:             Yes
  Supports cooperative launch:              Yes
  Supports multi-device cooperative launch: Yes
  PCI bus/location:                         0/0
  Compute mode:                             Default
    Multiple contexts are allowed on the device simultaneously
@mstksg
Copy link
Author

mstksg commented Mar 1, 2024

Ah i just noticed this is already addressed by #75!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant