Add a script to gather runner info when uploading benchmark results #6425

huydhn · 2025-03-17T19:41:23Z

Implement the logic to gather runner info for GPU. I adopt this logic from https://github.com/pytorch/pytorch-integration-testing/blob/master/vllm-benchmarks/upload_benchmark_results.py#L102

This also cleans up v2 logic which is not used anymore.

cc @yangw-dev Please let me know if you have a better approach in mind from the utilization monitoring project. Essentially, I want to get the device name, i.e. CUDA, ROCm, and the device type, i.e. H100, MI300X, so that they can be displayed on the dashboard. Before this change, these fields are set by the caller, now they can be set automatically by the GHA.

vercel · 2025-03-17T19:41:27Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Updated (UTC)
torchci	⬜️ Ignored (Inspect)	Visit Preview	Mar 17, 2025 7:49pm

.github/scripts/benchmarks/gather_runners_info.py

huydhn · 2025-03-17T20:26:22Z

This depends on #6429

yangw-dev · 2025-03-17T21:54:22Z

.github/scripts/benchmarks/gather_runners_info.py

+            device_type = torch.cuda.get_device_name()
+
+    except ImportError:
+        pass


logging the error info to help debugging

yangw-dev · 2025-03-17T21:55:09Z

.github/scripts/benchmarks/gather_runners_info.py

+        runner_info["type"] = device_type
+        runner_info["gpu_count"] = torch.cuda.device_count()
+        runner_info["avail_gpu_mem_in_gb"] = int(
+            torch.cuda.get_device_properties(0).total_memory


is each device has same memory?

Yup, that's the regular setup

While working on #6425, I discover several bugs in the upload scripts: * If there is an invalid JSON file in the directory, the script returns instead of continue, skipping all records after. Covered by https://github.com/pytorch/test-infra/blob/main/.github/scripts/benchmark-results-dir-for-testing/v3/mock.json * The script didn't handle correctly JSONEachRow format with only one record. Covered by a new test JSON from https://github.com/pytorch/test-infra/pull/6425/files#diff-bff954994eb33173b7119ff8d280f3367117b2daa9b8c54888be5f48f183a280 * The script didn't handle correctly JSONEachRow format mix with list of records. Covered by https://github.com/pytorch/test-infra/blob/main/.github/scripts/benchmark-results-dir-for-testing/v3/json-each-row.json#L3 ### Testing https://github.com/pytorch/test-infra/actions/runs/13909203687/job/38919334944#step:5:125 looks correct now

Add a script to gather runner info when uploading benchmark results

8143f70

huydhn requested a review from yangw-dev March 17, 2025 19:41

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 17, 2025

github-advanced-security bot found potential problems Mar 17, 2025

View reviewed changes

.github/scripts/benchmarks/gather_runners_info.py Fixed Show fixed Hide fixed

huydhn added 3 commits March 17, 2025 12:44

Fix lint

45496e9

Add pynvml

57fbb6d

Clean up v2

1928a55

huydhn mentioned this pull request Mar 17, 2025

Fix some bugs in upload benchmark scripts #6429

Merged

yangw-dev approved these changes Mar 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a script to gather runner info when uploading benchmark results #6425

Add a script to gather runner info when uploading benchmark results #6425

huydhn commented Mar 17, 2025 •

edited

Loading

vercel bot commented Mar 17, 2025 •

edited

Loading

huydhn commented Mar 17, 2025

yangw-dev Mar 17, 2025

yangw-dev Mar 17, 2025

huydhn Mar 18, 2025

Add a script to gather runner info when uploading benchmark results #6425

Are you sure you want to change the base?

Add a script to gather runner info when uploading benchmark results #6425

Conversation

huydhn commented Mar 17, 2025 • edited Loading

vercel bot commented Mar 17, 2025 • edited Loading

huydhn commented Mar 17, 2025

yangw-dev Mar 17, 2025

Choose a reason for hiding this comment

yangw-dev Mar 17, 2025

Choose a reason for hiding this comment

huydhn Mar 18, 2025

Choose a reason for hiding this comment

huydhn commented Mar 17, 2025 •

edited

Loading

vercel bot commented Mar 17, 2025 •

edited

Loading