Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for AMD Radeon GPU through ROCm_SMI_lib #99

Open
bertysentry opened this issue Jul 24, 2024 · 1 comment
Open

Add support for AMD Radeon GPU through ROCm_SMI_lib #99

bertysentry opened this issue Jul 24, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@bertysentry
Copy link
Member

Use Case

Monitor status and performance of AMD GPUs like AMD MI-100

Specification

Write a connector that leverages ROCm_SMI_lib CLI.

Implement:

  • GPU properties discovery
  • Hardware health (temperature, fans, voltages)
  • Performance
@bertysentry bertysentry added the enhancement New feature or request label Jul 24, 2024
@bertysentry
Copy link
Member Author

A few samples from rocm-smi CLI:

rocm-smi


====================================== ROCm System Management Interface ======================================
================================================ Concise Info ================================================
Device  [Model : Revision]    Temp    Power  Partitions      SCLK    MCLK     Fan  Perf  PwrCap  VRAM%  GPU%  
        Name (20 chars)       (Edge)  (Avg)  (Mem, Compute)                                                   
==============================================================================================================
0       [0x0c34 : 0x01]       23.0°C  30.0W  N/A, N/A        300Mhz  1200Mhz  0%   auto  290.0W   98%   0%    
        Arcturus GL-XL [Inst                                                                                  
1       [0x0c34 : 0x01]       23.0°C  30.0W  N/A, N/A        300Mhz  1200Mhz  0%   auto  290.0W   98%   0%    
        Arcturus GL-XL [Inst                                                                                  
2       [0x0c34 : 0x01]       23.0°C  34.0W  N/A, N/A        300Mhz  1200Mhz  0%   auto  290.0W   98%   0%    
        Arcturus GL-XL [Inst                                                                                  
3       [0x0c34 : 0x01]       21.0°C  30.0W  N/A, N/A        300Mhz  1200Mhz  0%   auto  290.0W   98%   0%    
        Arcturus GL-XL [Inst                                                                                  
4       [0x0c34 : 0x01]       21.0°C  33.0W  N/A, N/A        300Mhz  1200Mhz  0%   auto  290.0W   98%   0%    
        Arcturus GL-XL [Inst                                                                                  
5       [0x0c34 : 0x01]       21.0°C  30.0W  N/A, N/A        300Mhz  1200Mhz  0%   auto  290.0W   98%   0%    
        Arcturus GL-XL [Inst                                                                                  
6       [0x0c34 : 0x01]       21.0°C  30.0W  N/A, N/A        300Mhz  1200Mhz  0%   auto  290.0W   98%   0%    
        Arcturus GL-XL [Inst                                                                                  
7       [0x0c34 : 0x01]       21.0°C  34.0W  N/A, N/A        300Mhz  1200Mhz  0%   auto  290.0W   98%   0%    
        Arcturus GL-XL [Inst                                                                                  
==============================================================================================================
============================================ End of ROCm SMI Log =============================================
rocm-smi -a --csv
device,Driver version,PID429045,PID429043,PID429048,PID429046,PID429044,PID429050,PID429049,PID429047
system,6.7.0,unknown 1 33338785792 0 0,unknown 1 33338843136 0 0,unknown 1 33338785792 0 0,unknown 1 33338814464 0 0,unknown 1 33338814464 0 0,unknown 1 33338785792 0 0,unknown 1 33338843136 0 0,unknown 1 33338843136 0 0

device,Device ID,Device Rev,Unique ID,VBIOS version,Temperature (Sensor edge) (C),Temperature (Sensor junction) (C),Temperature (Sensor memory) (C),fclk clock speed:,fclk clock level:,mclk clock speed:,mclk clock level:,sclk clock speed:,sclk clock level:,socclk clock speed:,socclk clock level:,pcie clock level,Performance Level,Max Graphics Package Power (W),Average Graphics Package Power (W),GPU use (%),GPU memory use (%),Memory Activity,Avg. Memory Bandwidth,GPU memory vendor,PCIe Replay Count,Serial Number,Voltage (mV),PCI Bus,ASD firmware version,MEC firmware version,MEC2 firmware version,RLC firmware version,SDMA firmware version,SDMA2 firmware version,SMC firmware version,SOS firmware version,TA RAS firmware version,TA XGMI firmware version,VCN firmware version,Card series,Card model,Card vendor,Card SKU,Energy counter,Accumulated Energy (uJ)
card0,0x738c,0x1,0xe00e11e94b2aced1,113-D3431401-100,23.0,26.0,23.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,30.0,0,0,N/A,0,samsung,0,N/A,662,0000:00:11.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1583778,24231803.7
card1,0x738c,0x1,0xc0bc5a02e56f35d8,113-D3431401-100,23.0,25.0,24.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,30.0,0,0,N/A,0,samsung,0,N/A,662,0000:00:1B.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1583778,24231803.7
card2,0x738c,0x1,0x91c890bc77e9d0e5,113-D3431401-100,24.0,26.0,24.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,34.0,0,0,N/A,0,samsung,0,N/A,662,0000:00:1C.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1583778,24231803.7
card3,0x738c,0x1,0x648ed51f97a4247d,113-D3431401-100,21.0,25.0,24.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,31.0,0,0,N/A,0,samsung,0,N/A,662,0000:03:0D.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1583778,24231803.7
card4,0x738c,0x1,0x3097be354c63bc79,113-D3431401-100,21.0,23.0,22.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,33.0,0,0,N/A,0,samsung,0,N/A,656,0000:03:0E.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1583778,24231803.7
card5,0x738c,0x1,0x48a7dbb72a7240dd,113-D3431401-100,21.0,24.0,24.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,29.0,0,0,N/A,0,samsung,0,N/A,656,0000:03:0F.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1583778,24231803.7
card6,0x738c,0x1,0x1220f19ff8ae0623,113-D3431401-100,21.0,23.0,23.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,30.0,0,0,N/A,0,samsung,0,N/A,662,0000:03:10.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1583778,24231803.7
card7,0x738c,0x1,0xd451b41bd0041432,113-D3431401-100,21.0,24.0,21.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,34.0,0,0,N/A,0,samsung,0,N/A,656,0000:03:11.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1583778,24231803.7
rocm-smi -t --csv
device,Temperature (Sensor edge) (C),Temperature (Sensor junction) (C),Temperature (Sensor memory) (C)
card0,23.0,26.0,22.0
card1,22.0,25.0,24.0
card2,23.0,25.0,23.0
card3,21.0,25.0,24.0
card4,22.0,24.0,22.0
card5,21.0,24.0,24.0
card6,21.0,24.0,23.0
card7,20.0,24.0,21.0
rocm-smi -P --csv
device,Average Graphics Package Power (W)
card0,30.0
card1,30.0
card2,34.0
card3,31.0
card4,33.0
card5,30.0
card6,30.0
card7,35.0
rocm-smi -u --csv
device,GPU use (%)
card0,0
card1,0
card2,0
card3,0
card4,0
card5,0
card6,0
card7,0
rocm-smi --showvoltage --csv
device,Voltage (mV)
card0,662
card1,656
card2,662
card3,662
card4,656
card5,656
card6,662
rocm-smi --alldevices --showhw 


============================ ROCm System Management Interface ============================
================================= Concise Hardware Info ==================================
GPU  DID   DREV  GFX RAS  SDMA RAS  UMC RAS  VBIOS             BUS           
0    738c  1     ENABLED  ENABLED   ENABLED  113-D3431401-100  0000:00:11.0  
1    738c  1     ENABLED  ENABLED   ENABLED  113-D3431401-100  0000:00:1B.0  
2    738c  1     ENABLED  ENABLED   ENABLED  113-D3431401-100  0000:00:1C.0  
3    738c  1     ENABLED  ENABLED   ENABLED  113-D3431401-100  0000:03:0D.0  
4    738c  1     ENABLED  ENABLED   ENABLED  113-D3431401-100  0000:03:0E.0  
5    738c  1     ENABLED  ENABLED   ENABLED  113-D3431401-100  0000:03:0F.0  
6    738c  1     ENABLED  ENABLED   ENABLED  113-D3431401-100  0000:03:10.0  
7    738c  1     ENABLED  ENABLED   ENABLED  113-D3431401-100  0000:03:11.0  
==========================================================================================
================================== End of ROCm SMI Log ===================================
rocm-smi -i -v --showdriverversion --showfwinfo --showmclkrange --showmemvendor --showsclkrange --showproductname --showserial --showuniqueid --showvoltagerange --showbus


============================ ROCm System Management Interface ============================
============================== Version of System Component ===============================
Driver version: 6.7.0
==========================================================================================
=========================================== ID ===========================================
GPU[0]		: Device ID: 0x738c
GPU[0]		: Device Rev: 0x1
GPU[1]		: Device ID: 0x738c
GPU[1]		: Device Rev: 0x1
GPU[2]		: Device ID: 0x738c
GPU[2]		: Device Rev: 0x1
GPU[3]		: Device ID: 0x738c
GPU[3]		: Device Rev: 0x1
GPU[4]		: Device ID: 0x738c
GPU[4]		: Device Rev: 0x1
GPU[5]		: Device ID: 0x738c
GPU[5]		: Device Rev: 0x1
GPU[6]		: Device ID: 0x738c
GPU[6]		: Device Rev: 0x1
GPU[7]		: Device ID: 0x738c
GPU[7]		: Device Rev: 0x1
==========================================================================================
======================================= Unique ID ========================================
GPU[0]		: Unique ID: 0xe00e11e94b2aced1
GPU[1]		: Unique ID: 0xc0bc5a02e56f35d8
GPU[2]		: Unique ID: 0x91c890bc77e9d0e5
GPU[3]		: Unique ID: 0x648ed51f97a4247d
GPU[4]		: Unique ID: 0x3097be354c63bc79
GPU[5]		: Unique ID: 0x48a7dbb72a7240dd
GPU[6]		: Unique ID: 0x1220f19ff8ae0623
GPU[7]		: Unique ID: 0xd451b41bd0041432
==========================================================================================
========================================= VBIOS ==========================================
GPU[0]		: VBIOS version: 113-D3431401-100
GPU[1]		: VBIOS version: 113-D3431401-100
GPU[2]		: VBIOS version: 113-D3431401-100
GPU[3]		: VBIOS version: 113-D3431401-100
GPU[4]		: VBIOS version: 113-D3431401-100
GPU[5]		: VBIOS version: 113-D3431401-100
GPU[6]		: VBIOS version: 113-D3431401-100
GPU[7]		: VBIOS version: 113-D3431401-100
==========================================================================================
===================================== Memory Vendor ======================================
GPU[0]		: GPU memory vendor: samsung
GPU[1]		: GPU memory vendor: samsung
GPU[2]		: GPU memory vendor: samsung
GPU[3]		: GPU memory vendor: samsung
GPU[4]		: GPU memory vendor: samsung
GPU[5]		: GPU memory vendor: samsung
GPU[6]		: GPU memory vendor: samsung
GPU[7]		: GPU memory vendor: samsung
==========================================================================================
===================================== Serial Number ======================================
GPU[0]		: get_serial_number, Not supported on the given system
GPU[0]		: Serial Number: N/A
GPU[1]		: get_serial_number, Not supported on the given system
GPU[1]		: Serial Number: N/A
GPU[2]		: get_serial_number, Not supported on the given system
GPU[2]		: Serial Number: N/A
GPU[3]		: get_serial_number, Not supported on the given system
GPU[3]		: Serial Number: N/A
GPU[4]		: get_serial_number, Not supported on the given system
GPU[4]		: Serial Number: N/A
GPU[5]		: get_serial_number, Not supported on the given system
GPU[5]		: Serial Number: N/A
GPU[6]		: get_serial_number, Not supported on the given system
GPU[6]		: Serial Number: N/A
GPU[7]		: get_serial_number, Not supported on the given system
GPU[7]		: Serial Number: N/A
==========================================================================================
======================================= PCI Bus ID =======================================
GPU[0]		: PCI Bus: 0000:00:11.0
GPU[1]		: PCI Bus: 0000:00:1B.0
GPU[2]		: PCI Bus: 0000:00:1C.0
GPU[3]		: PCI Bus: 0000:03:0D.0
GPU[4]		: PCI Bus: 0000:03:0E.0
GPU[5]		: PCI Bus: 0000:03:0F.0
GPU[6]		: PCI Bus: 0000:03:10.0
GPU[7]		: PCI Bus: 0000:03:11.0
==========================================================================================
================================== Firmware Information ==================================
GPU[0]		: ASD firmware version: 	0x21000059
GPU[0]		: get_firmware_version_CE, Not supported on the given system
GPU[0]		: get_firmware_version_DMCU, Not supported on the given system
GPU[0]		: get_firmware_version_MC, Not supported on the given system
GPU[0]		: get_firmware_version_ME, Not supported on the given system
GPU[0]		: MEC firmware version: 	65
GPU[0]		: MEC2 firmware version: 	65
GPU[0]		: get_firmware_version_MES, Not supported on the given system
GPU[0]		: get_firmware_version_MES KIQ, Not supported on the given system
GPU[0]		: get_firmware_version_PFP, Not supported on the given system
GPU[0]		: RLC firmware version: 	24
GPU[0]		: get_firmware_version_RLC SRLC, Not supported on the given system
GPU[0]		: get_firmware_version_RLC SRLG, Not supported on the given system
GPU[0]		: get_firmware_version_RLC SRLS, Not supported on the given system
GPU[0]		: SDMA firmware version: 	18
GPU[0]		: SDMA2 firmware version: 	18
GPU[0]		: SMC firmware version: 	00.54.29.00
GPU[0]		: SOS firmware version: 	0x0017004f
GPU[0]		: TA RAS firmware version: 	27.00.01.62
GPU[0]		: TA XGMI firmware version: 	32.00.00.17
GPU[0]		: get_firmware_version_UVD, Not supported on the given system
GPU[0]		: get_firmware_version_VCE, Not supported on the given system
GPU[0]		: VCN firmware version: 	0x01101015
GPU[1]		: ASD firmware version: 	0x21000059
GPU[1]		: get_firmware_version_CE, Not supported on the given system
GPU[1]		: get_firmware_version_DMCU, Not supported on the given system
GPU[1]		: get_firmware_version_MC, Not supported on the given system
GPU[1]		: get_firmware_version_ME, Not supported on the given system
GPU[1]		: MEC firmware version: 	65
GPU[1]		: MEC2 firmware version: 	65
GPU[1]		: get_firmware_version_MES, Not supported on the given system
GPU[1]		: get_firmware_version_MES KIQ, Not supported on the given system
GPU[1]		: get_firmware_version_PFP, Not supported on the given system
GPU[1]		: RLC firmware version: 	24
GPU[1]		: get_firmware_version_RLC SRLC, Not supported on the given system
GPU[1]		: get_firmware_version_RLC SRLG, Not supported on the given system
GPU[1]		: get_firmware_version_RLC SRLS, Not supported on the given system
GPU[1]		: SDMA firmware version: 	18
GPU[1]		: SDMA2 firmware version: 	18
GPU[1]		: SMC firmware version: 	00.54.29.00
GPU[1]		: SOS firmware version: 	0x0017004f
GPU[1]		: TA RAS firmware version: 	27.00.01.62
GPU[1]		: TA XGMI firmware version: 	32.00.00.17
GPU[1]		: get_firmware_version_UVD, Not supported on the given system
GPU[1]		: get_firmware_version_VCE, Not supported on the given system
GPU[1]		: VCN firmware version: 	0x01101015
GPU[2]		: ASD firmware version: 	0x21000059
GPU[2]		: get_firmware_version_CE, Not supported on the given system
GPU[2]		: get_firmware_version_DMCU, Not supported on the given system
GPU[2]		: get_firmware_version_MC, Not supported on the given system
GPU[2]		: get_firmware_version_ME, Not supported on the given system
GPU[2]		: MEC firmware version: 	65
GPU[2]		: MEC2 firmware version: 	65
GPU[2]		: get_firmware_version_MES, Not supported on the given system
GPU[2]		: get_firmware_version_MES KIQ, Not supported on the given system
GPU[2]		: get_firmware_version_PFP, Not supported on the given system
GPU[2]		: RLC firmware version: 	24
GPU[2]		: get_firmware_version_RLC SRLC, Not supported on the given system
GPU[2]		: get_firmware_version_RLC SRLG, Not supported on the given system
GPU[2]		: get_firmware_version_RLC SRLS, Not supported on the given system
GPU[2]		: SDMA firmware version: 	18
GPU[2]		: SDMA2 firmware version: 	18
GPU[2]		: SMC firmware version: 	00.54.29.00
GPU[2]		: SOS firmware version: 	0x0017004f
GPU[2]		: TA RAS firmware version: 	27.00.01.62
GPU[2]		: TA XGMI firmware version: 	32.00.00.17
GPU[2]		: get_firmware_version_UVD, Not supported on the given system
GPU[2]		: get_firmware_version_VCE, Not supported on the given system
GPU[2]		: VCN firmware version: 	0x01101015
GPU[3]		: ASD firmware version: 	0x21000059
GPU[3]		: get_firmware_version_CE, Not supported on the given system
GPU[3]		: get_firmware_version_DMCU, Not supported on the given system
GPU[3]		: get_firmware_version_MC, Not supported on the given system
GPU[3]		: get_firmware_version_ME, Not supported on the given system
GPU[3]		: MEC firmware version: 	65
GPU[3]		: MEC2 firmware version: 	65
GPU[3]		: get_firmware_version_MES, Not supported on the given system
GPU[3]		: get_firmware_version_MES KIQ, Not supported on the given system
GPU[3]		: get_firmware_version_PFP, Not supported on the given system
GPU[3]		: RLC firmware version: 	24
GPU[3]		: get_firmware_version_RLC SRLC, Not supported on the given system
GPU[3]		: get_firmware_version_RLC SRLG, Not supported on the given system
GPU[3]		: get_firmware_version_RLC SRLS, Not supported on the given system
GPU[3]		: SDMA firmware version: 	18
GPU[3]		: SDMA2 firmware version: 	18
GPU[3]		: SMC firmware version: 	00.54.29.00
GPU[3]		: SOS firmware version: 	0x0017004f
GPU[3]		: TA RAS firmware version: 	27.00.01.62
GPU[3]		: TA XGMI firmware version: 	32.00.00.17
GPU[3]		: get_firmware_version_UVD, Not supported on the given system
GPU[3]		: get_firmware_version_VCE, Not supported on the given system
GPU[3]		: VCN firmware version: 	0x01101015
GPU[4]		: ASD firmware version: 	0x21000059
GPU[4]		: get_firmware_version_CE, Not supported on the given system
GPU[4]		: get_firmware_version_DMCU, Not supported on the given system
GPU[4]		: get_firmware_version_MC, Not supported on the given system
GPU[4]		: get_firmware_version_ME, Not supported on the given system
GPU[4]		: MEC firmware version: 	65
GPU[4]		: MEC2 firmware version: 	65
GPU[4]		: get_firmware_version_MES, Not supported on the given system
GPU[4]		: get_firmware_version_MES KIQ, Not supported on the given system
GPU[4]		: get_firmware_version_PFP, Not supported on the given system
GPU[4]		: RLC firmware version: 	24
GPU[4]		: get_firmware_version_RLC SRLC, Not supported on the given system
GPU[4]		: get_firmware_version_RLC SRLG, Not supported on the given system
GPU[4]		: get_firmware_version_RLC SRLS, Not supported on the given system
GPU[4]		: SDMA firmware version: 	18
GPU[4]		: SDMA2 firmware version: 	18
GPU[4]		: SMC firmware version: 	00.54.29.00
GPU[4]		: SOS firmware version: 	0x0017004f
GPU[4]		: TA RAS firmware version: 	27.00.01.62
GPU[4]		: TA XGMI firmware version: 	32.00.00.17
GPU[4]		: get_firmware_version_UVD, Not supported on the given system
GPU[4]		: get_firmware_version_VCE, Not supported on the given system
GPU[4]		: VCN firmware version: 	0x01101015
GPU[5]		: ASD firmware version: 	0x21000059
GPU[5]		: get_firmware_version_CE, Not supported on the given system
GPU[5]		: get_firmware_version_DMCU, Not supported on the given system
GPU[5]		: get_firmware_version_MC, Not supported on the given system
GPU[5]		: get_firmware_version_ME, Not supported on the given system
GPU[5]		: MEC firmware version: 	65
GPU[5]		: MEC2 firmware version: 	65
GPU[5]		: get_firmware_version_MES, Not supported on the given system
GPU[5]		: get_firmware_version_MES KIQ, Not supported on the given system
GPU[5]		: get_firmware_version_PFP, Not supported on the given system
GPU[5]		: RLC firmware version: 	24
GPU[5]		: get_firmware_version_RLC SRLC, Not supported on the given system
GPU[5]		: get_firmware_version_RLC SRLG, Not supported on the given system
GPU[5]		: get_firmware_version_RLC SRLS, Not supported on the given system
GPU[5]		: SDMA firmware version: 	18
GPU[5]		: SDMA2 firmware version: 	18
GPU[5]		: SMC firmware version: 	00.54.29.00
GPU[5]		: SOS firmware version: 	0x0017004f
GPU[5]		: TA RAS firmware version: 	27.00.01.62
GPU[5]		: TA XGMI firmware version: 	32.00.00.17
GPU[5]		: get_firmware_version_UVD, Not supported on the given system
GPU[5]		: get_firmware_version_VCE, Not supported on the given system
GPU[5]		: VCN firmware version: 	0x01101015
GPU[6]		: ASD firmware version: 	0x21000059
GPU[6]		: get_firmware_version_CE, Not supported on the given system
GPU[6]		: get_firmware_version_DMCU, Not supported on the given system
GPU[6]		: get_firmware_version_MC, Not supported on the given system
GPU[6]		: get_firmware_version_ME, Not supported on the given system
GPU[6]		: MEC firmware version: 	65
GPU[6]		: MEC2 firmware version: 	65
GPU[6]		: get_firmware_version_MES, Not supported on the given system
GPU[6]		: get_firmware_version_MES KIQ, Not supported on the given system
GPU[6]		: get_firmware_version_PFP, Not supported on the given system
GPU[6]		: RLC firmware version: 	24
GPU[6]		: get_firmware_version_RLC SRLC, Not supported on the given system
GPU[6]		: get_firmware_version_RLC SRLG, Not supported on the given system
GPU[6]		: get_firmware_version_RLC SRLS, Not supported on the given system
GPU[6]		: SDMA firmware version: 	18
GPU[6]		: SDMA2 firmware version: 	18
GPU[6]		: SMC firmware version: 	00.54.29.00
GPU[6]		: SOS firmware version: 	0x0017004f
GPU[6]		: TA RAS firmware version: 	27.00.01.62
GPU[6]		: TA XGMI firmware version: 	32.00.00.17
GPU[6]		: get_firmware_version_UVD, Not supported on the given system
GPU[6]		: get_firmware_version_VCE, Not supported on the given system
GPU[6]		: VCN firmware version: 	0x01101015
GPU[7]		: ASD firmware version: 	0x21000059
GPU[7]		: get_firmware_version_CE, Not supported on the given system
GPU[7]		: get_firmware_version_DMCU, Not supported on the given system
GPU[7]		: get_firmware_version_MC, Not supported on the given system
GPU[7]		: get_firmware_version_ME, Not supported on the given system
GPU[7]		: MEC firmware version: 	65
GPU[7]		: MEC2 firmware version: 	65
GPU[7]		: get_firmware_version_MES, Not supported on the given system
GPU[7]		: get_firmware_version_MES KIQ, Not supported on the given system
GPU[7]		: get_firmware_version_PFP, Not supported on the given system
GPU[7]		: RLC firmware version: 	24
GPU[7]		: get_firmware_version_RLC SRLC, Not supported on the given system
GPU[7]		: get_firmware_version_RLC SRLG, Not supported on the given system
GPU[7]		: get_firmware_version_RLC SRLS, Not supported on the given system
GPU[7]		: SDMA firmware version: 	18
GPU[7]		: SDMA2 firmware version: 	18
GPU[7]		: SMC firmware version: 	00.54.29.00
GPU[7]		: SOS firmware version: 	0x0017004f
GPU[7]		: TA RAS firmware version: 	27.00.01.62
GPU[7]		: TA XGMI firmware version: 	32.00.00.17
GPU[7]		: get_firmware_version_UVD, Not supported on the given system
GPU[7]		: get_firmware_version_VCE, Not supported on the given system
GPU[7]		: VCN firmware version: 	0x01101015
==========================================================================================
====================================== Product Info ======================================
GPU[0]		: Card series: 		Arcturus GL-XL [Instinct MI100]
GPU[0]		: Card model: 		0x0c34
GPU[0]		: Card vendor: 		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0]		: Card SKU: 		D3431401
GPU[1]		: Card series: 		Arcturus GL-XL [Instinct MI100]
GPU[1]		: Card model: 		0x0c34
GPU[1]		: Card vendor: 		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[1]		: Card SKU: 		D3431401
GPU[2]		: Card series: 		Arcturus GL-XL [Instinct MI100]
GPU[2]		: Card model: 		0x0c34
GPU[2]		: Card vendor: 		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[2]		: Card SKU: 		D3431401
GPU[3]		: Card series: 		Arcturus GL-XL [Instinct MI100]
GPU[3]		: Card model: 		0x0c34
GPU[3]		: Card vendor: 		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[3]		: Card SKU: 		D3431401
GPU[4]		: Card series: 		Arcturus GL-XL [Instinct MI100]
GPU[4]		: Card model: 		0x0c34
GPU[4]		: Card vendor: 		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[4]		: Card SKU: 		D3431401
GPU[5]		: Card series: 		Arcturus GL-XL [Instinct MI100]
GPU[5]		: Card model: 		0x0c34
GPU[5]		: Card vendor: 		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[5]		: Card SKU: 		D3431401
GPU[6]		: Card series: 		Arcturus GL-XL [Instinct MI100]
GPU[6]		: Card model: 		0x0c34
GPU[6]		: Card vendor: 		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[6]		: Card SKU: 		D3431401
GPU[7]		: Card series: 		Arcturus GL-XL [Instinct MI100]
GPU[7]		: Card model: 		0x0c34
GPU[7]		: Card vendor: 		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[7]		: Card SKU: 		D3431401
==========================================================================================
================================= Show Valid sclk Range ==================================
GPU[0]		: get_od_volt, Not supported on the given system
GPU[1]		: get_od_volt, Not supported on the given system
GPU[2]		: get_od_volt, Not supported on the given system
GPU[3]		: get_od_volt, Not supported on the given system
GPU[4]		: get_od_volt, Not supported on the given system
GPU[5]		: get_od_volt, Not supported on the given system
GPU[6]		: get_od_volt, Not supported on the given system
GPU[7]		: get_od_volt, Not supported on the given system
==========================================================================================
================================= Show Valid mclk Range ==================================
GPU[0]		: get_od_volt, Not supported on the given system
GPU[1]		: get_od_volt, Not supported on the given system
GPU[2]		: get_od_volt, Not supported on the given system
GPU[3]		: get_od_volt, Not supported on the given system
GPU[4]		: get_od_volt, Not supported on the given system
GPU[5]		: get_od_volt, Not supported on the given system
GPU[6]		: get_od_volt, Not supported on the given system
GPU[7]		: get_od_volt, Not supported on the given system
==========================================================================================
================================ Show Valid voltage Range ================================
GPU[0]		: get_od_volt, Not supported on the given system
GPU[1]		: get_od_volt, Not supported on the given system
GPU[2]		: get_od_volt, Not supported on the given system
GPU[3]		: get_od_volt, Not supported on the given system
GPU[4]		: get_od_volt, Not supported on the given system
GPU[5]		: get_od_volt, Not supported on the given system
GPU[6]		: get_od_volt, Not supported on the given system
GPU[7]		: get_od_volt, Not supported on the given system
==========================================================================================
================================== End of ROCm SMI Log ===================================
rocm-smi -c --csv
device,fclk clock speed:,fclk clock level:,mclk clock speed:,mclk clock level:,sclk clock speed:,sclk clock level:,socclk clock speed:,socclk clock level:,pcie clock level
card0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16)
card1,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16)
card2,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16)
card3,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16)
card4,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16)
card5,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16)
card6,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16)
card7,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16)
rocm-smi -g --csv
device,sclk clock level
card0,0 (300Mhz)
card1,0 (300Mhz)
card2,0 (300Mhz)
card3,0 (300Mhz)
card4,0 (300Mhz)
card5,0 (300Mhz)
card6,0 (300Mhz)
card7,0 (300Mhz)
rocm-smi -M --csv
device,Max Graphics Package Power (W)
card0,290.0
card1,290.0
card2,290.0
card3,290.0
card4,290.0
card5,290.0
card6,290.0
card7,290.0
rocm-smi -a --showtemp --csv 
device,Driver version,PID429045,PID429043,PID429048,PID429046,PID429044,PID429050,PID429049,PID429047
system,6.7.0,unknown 1 33338785792 0 0,unknown 1 33338843136 0 0,unknown 1 33338785792 0 0,unknown 1 33338814464 0 0,unknown 1 33338814464 0 0,unknown 1 33338785792 0 0,unknown 1 33338843136 0 0,unknown 1 33338843136 0 0

device,Device ID,Device Rev,Unique ID,VBIOS version,Temperature (Sensor edge) (C),Temperature (Sensor junction) (C),Temperature (Sensor memory) (C),fclk clock speed:,fclk clock level:,mclk clock speed:,mclk clock level:,sclk clock speed:,sclk clock level:,socclk clock speed:,socclk clock level:,pcie clock level,Performance Level,Max Graphics Package Power (W),Average Graphics Package Power (W),GPU use (%),GPU memory use (%),Memory Activity,Avg. Memory Bandwidth,GPU memory vendor,PCIe Replay Count,Serial Number,Voltage (mV),PCI Bus,ASD firmware version,MEC firmware version,MEC2 firmware version,RLC firmware version,SDMA firmware version,SDMA2 firmware version,SMC firmware version,SOS firmware version,TA RAS firmware version,TA XGMI firmware version,VCN firmware version,Card series,Card model,Card vendor,Card SKU,Energy counter,Accumulated Energy (uJ)
card0,0x738c,0x1,0xe00e11e94b2aced1,113-D3431401-100,24.0,26.0,22.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,30.0,0,0,N/A,0,samsung,0,N/A,662,0000:00:11.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1584129,24237174.0
card1,0x738c,0x1,0xc0bc5a02e56f35d8,113-D3431401-100,23.0,25.0,23.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,30.0,0,0,N/A,0,samsung,0,N/A,656,0000:00:1B.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1584129,24237174.0
card2,0x738c,0x1,0x91c890bc77e9d0e5,113-D3431401-100,23.0,25.0,23.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,34.0,0,0,N/A,0,samsung,0,N/A,656,0000:00:1C.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1584129,24237174.0
card3,0x738c,0x1,0x648ed51f97a4247d,113-D3431401-100,21.0,24.0,25.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,30.0,0,0,N/A,0,samsung,0,N/A,662,0000:03:0D.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1584129,24237174.0
card4,0x738c,0x1,0x3097be354c63bc79,113-D3431401-100,21.0,23.0,22.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,33.0,0,0,N/A,0,samsung,0,N/A,656,0000:03:0E.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1584129,24237174.0
card5,0x738c,0x1,0x48a7dbb72a7240dd,113-D3431401-100,22.0,24.0,24.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,29.0,0,0,N/A,0,samsung,0,N/A,656,0000:03:0F.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1584129,24237174.0
card6,0x738c,0x1,0x1220f19ff8ae0623,113-D3431401-100,21.0,23.0,23.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,30.0,0,0,N/A,0,samsung,0,N/A,662,0000:03:10.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1584129,24237174.0
card7,0x738c,0x1,0xd451b41bd0041432,113-D3431401-100,21.0,24.0,21.0,(1402Mhz),0,(1200Mhz),0,(300Mhz),0,(1000Mhz),0,0 (16.0GT/s x16),auto,290.0,34.0,0,0,N/A,0,samsung,0,N/A,656,0000:03:11.0,0x21000059,65,65,24,18,18,00.54.29.00,0x0017004f,27.00.01.62,32.00.00.17,0x01101015,Arcturus GL-XL [Instinct MI100],0x0c34,Advanced Micro Devices Inc. [AMD/ATI],D3431401,1584129,24237174.0

@SafaeAJ SafaeAJ changed the title Add support for AMD Radeo GPU through ROCm_SMI_lib Add support for AMD Radeon GPU through ROCm_SMI_lib Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants