Functions to automatically select Device with most flops/memory #41

ProjectPhysX · 2022-02-10T16:11:10Z

Added utility functions to automatically select the fastest Device or the Device with largest memory capacity from all available Devices.

For selecting the fastest Device, the TFLOPs/s performance of the Device is estimated. For Nvidia and AMD GPUs, the estimate is challenging due to the different number of cores per CU depending on the microarchitecture and even GPU model:

AMD GCN, CDNA: 64 cores/CU
AMD RDNA, RDNA2: 128 cores/CU (dual CUs are reported as CUs in OpenCL)
Nvidia Kepler: 192 cores/CU
Nvidia Maxwell, Pascal, Ampere: 128 cores/CU
Nvidia P100, Volta, Turing, A100, A30: 64 cores/CU

The vast majority of GPUs are captured with the correct estimate, but for some rare/old GPUs, the estimate could be wrong by a factor of 2.
For CPUs without SMT/HT as well as for very old CPUs with IPC<32 or very new CPUs with IPC=64 (AVX-512), the estimate is wrong.

Overall however, the estimated values are good enough to identify the fastest device in systems with one CPU and one or multiple GPUs.

Added utility functions to automatically select the fastest Device or the Device with largest memory capacity from all available Devices. For selecting the fastest Device, the TFLOPs/s performance of the Device is estimated. For Nvidia and AMD GPUs, the estimate is challenging due to the different number of cores per CU depending on the microarchitecture and even GPU model: - AMD GCN, CDNA: 64 cores/CU - AMD RDNA, RDNA2: 128 cores/CU (dual CUs are reported as CUs in OpenCL) - Nvidia Kepler: 192 cores/CU - Nvidia Maxwell, Pascal, Ampere: 128 cores/CU - Nvidia P100, Volta, Turing, A100, A30: 64 cores/CU The vast majority of GPUs are captured with the correct estimate, but for some rare/old GPUs, the estimate could be wrong by a factor of 2. For CPUs without SMT/HT as well as for very old CPUs with IPC<32 or very new CPUs with IPC=64 (AVX-512), the estimate is wrong. Overall however, the estimated values are good enough to identify the fastest device in systems with one CPU and one or multiple GPUs.

CLAassistant · 2022-02-10T16:12:44Z

All committers have signed the CLA.

MathiasMagnus force-pushed the main branch 2 times, most recently from 66f643c to fc64822 Compare April 1, 2022 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Functions to automatically select Device with most flops/memory #41

Functions to automatically select Device with most flops/memory #41

ProjectPhysX commented Feb 10, 2022

CLAassistant commented Feb 10, 2022 •

edited

Loading

Functions to automatically select Device with most flops/memory #41

Are you sure you want to change the base?

Functions to automatically select Device with most flops/memory #41

Conversation

ProjectPhysX commented Feb 10, 2022

CLAassistant commented Feb 10, 2022 • edited Loading

CLAassistant commented Feb 10, 2022 •

edited

Loading