Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GH200] Unexpected Low Host-to-Device Bandwidth #23

Open
vitduck opened this issue Aug 23, 2024 · 3 comments
Open

[GH200] Unexpected Low Host-to-Device Bandwidth #23

vitduck opened this issue Aug 23, 2024 · 3 comments

Comments

@vitduck
Copy link

vitduck commented Aug 23, 2024

Hi,

We observed an unexpected low host-to-device bandwidth on GH200 Superchip.

  1. Specs

    • GH200 (480GB LPDDR5X + 96GB HBM3)
    • OS: Rocky Linux 9.3 (Blue Onyx)
    • nvbandwith: v5.0
  2. nvidia-smi topology:

            GPU0	NIC0	NIC1	CPU Affinity	NUMA Affinity	GPU NUMA ID
    GPU0	 X 	SYS	SYS	0-71	0		1
    NIC0	SYS	 X 	PIX				
    NIC1	SYS	PIX	 X 	
    
  3. output: nvbandwidth-gh200.log
    nvbw

    • SM is expected to give high bandwidth that CE in general since the latter is limited by DMA engine.
    • Bandwidth anomaly:
      • Host-to-device: CE (346 GB/s) vs SM (342 GB/s)
      • From NVIDIA reference result[, there should be ~ 15% difference between them.
      • Is there some hardware/kernel settings that negatively affect the SM-variant memcpy ?
    • Bandwidth asymmetry:
      • For GH200, H2D is approximately 15% faster than D2H, is this due to an intrinsic property of ATS ?
  4. ref:

Regards.

@deepakcu
Copy link
Collaborator

Are the GPU clocks locked? To what value? Can you attach the output of nvidia-smi -q?

@vitduck
Copy link
Author

vitduck commented Aug 25, 2024

@deepakcu

I have attached here the output of lscpu and nvidia-smi -q per your requested.
I believe everything is running at stock.

Below is addition information

  • Linux kernel with 64k page table
    $ uname -a 
    Linux gpu51 5.14.0-362.8.1.el9_3.aarch64+64k #1 SMP PREEMPT_DYNAMIC Thu Nov 9 05:07:41 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
    
  • NVIDIA driver: 550.54.14
  • Addition kernel module installed: nvidia_peermem, and gdrdrv

I also check dmesg immediately after running nvbandwidth and didn't observe and warning or error.

Thanks.

@deepakcu
Copy link
Collaborator

Can you repeat your test locking clocks at max (1980MHz)

sudo nvidia-smi --lock-gpu-clocks=1980,1980

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants