feat: add memcpy and memset to CUPTI timing method #2223

nv-yunzheq · 2025-12-16T02:09:22Z

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Chores
- Enhanced GPU benchmarking to also capture memory copy and set operations in addition to kernel activity, yielding more comprehensive performance metrics.
- Improved and stabilized kernel identification across iterations to make benchmark comparisons more reliable and consistent.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-16T02:09:32Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Extend CUPTI-based GPU benchmarking to collect and classify MEMCPY and MEMSET activities alongside CONCURRENT_KERNEL, add helpers to extract standardized kernel metadata, and produce stable kernel identifiers for consistent per-iteration comparisons.

Changes

Cohort / File(s)	Summary
GPU benchmarking utility `flashinfer/testing/utils.py`	Added CUPTI activity helpers (`set_kernel_name`, `get_bytes`, `get_copy_kind`, `get_value`, `collect_kernel_info`) and `generate_kernel_string`. Expanded kernel records from 4→8 fields, enabled MEMCPY/MEMSET measurement alongside CONCURRENT_KERNEL, updated per-iteration processing and lifecycle (enable/disable) handling, and switched comparisons to string-based kernel identifiers.
Manifest/config `requirements.txt`, `pyproject.toml`	Supporting configuration updates referenced by the PR.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Inspect helpers for correct handling of CONCURRENT_KERNEL, MEMCPY, MEMSET activity types.
Verify generate_kernel_string produces stable, unique identifiers (timing/correlation excluded).
Confirm CUPTI enable/disable calls and lifecycle pairing are correct.
Review changes to kernel tuple structure and all callsites that unpack or expect kernel fields.

Poem

I hop through traces, ears in tune, 🐇
Counting copies, sets, and kernels bright,
CUPTI whispers each memory rune,
I stitch their names by soft moonlight,
Benchmarks hop steady into the night. ✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description consists entirely of the template with no actual details filled in. All descriptive sections (Description, Related Issues, Reviewer Notes) are empty or contain only placeholder comments, making it impossible to understand the motivation and context for the changes.	Fill in the Description section with details about what the changes accomplish and why they're needed, link any related issues, and optionally add reviewer notes to guide the review process.
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and specifically describes the main change: adding memcpy and memset support to the CUPTI timing method, which directly aligns with the changeset's addition of CUPTI kernel-activity helpers and extended MEMCPY/MEMSET activity tracking.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9f535c and 7b1b2e5.

📒 Files selected for processing (1)

flashinfer/testing/utils.py (4 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (4)

flashinfer/testing/utils.py (4)

742-779: Well-structured helper functions for CUPTI activity metadata extraction.

The helper functions appropriately handle different activity kinds (CONCURRENT_KERNEL, MEMCPY, MEMSET) and provide sensible defaults. The separation of concerns makes the code clear and maintainable.

786-792: Correctly extended to capture MEMCPY and MEMSET activities.

The activity filtering now properly includes memory operations alongside kernel activities, and correctly uses the new collect_kernel_info helper to gather metadata.

870-871: Properly enable and disable MEMCPY and MEMSET activity kinds.

The activity lifecycle management correctly enables MEMCPY and MEMSET before measurement and disables them after flushing, maintaining symmetry with the existing activity kinds.

Also applies to: 889-890

908-908: Correctly uses the new kernel string generator for consistency checks.

The application of generate_kernel_string properly creates stable identifiers for cross-iteration kernel comparison.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-12-16T02:09:35Z

Summary of Changes

Hello @nv-yunzheq, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the existing CUPTI-based profiling utility by extending its capability to capture and report memcpy and memset operations alongside regular kernel executions. This improvement provides a more complete picture of GPU workload by including memory transfer and initialization activities in the performance analysis, which is crucial for identifying potential bottlenecks related to data movement.

Highlights

Expanded CUPTI Activity Tracking: The CUPTI timing method now tracks MEMCPY and MEMSET activities in addition to CONCURRENT_KERNEL activities, providing a more comprehensive view of GPU operations.
Detailed Activity Information: New helper functions (get_bytes, get_copy_kind, get_value) have been introduced to extract specific details for MEMCPY and MEMSET operations, such as bytes transferred, copy kind, and memset value.
Unified Kernel Information Collection: A collect_kernel_info function standardizes the collection of activity details, ensuring that all relevant attributes, including the new MEMCPY and MEMSET specific data, are captured.
Enhanced Kernel String Generation: A generate_kernel_string function is added to create unique identifiers for kernels, incorporating MEMCPY and MEMSET specific attributes for better differentiation and analysis.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request enhances the CUPTI-based timing utility by including memcpy and memset activities. The changes are logical and correctly enable, collect, and disable these new activity types. The changes look good and align with the goal. I have a couple of suggestions to improve code quality: one regarding an incorrect type hint and another to improve maintainability by replacing magic numbers with a structured data type like NamedTuple.

flashinfer/testing/utils.py

gemini-code-assist · 2025-12-16T02:10:47Z

flashinfer/testing/utils.py

+    def generate_kernel_string(kernel):
+        # No start, end, correlation_id is considered in the kernel string
+        return f"{kernel[0]}_{kernel[4]}_{kernel[5]}_{kernel[6]}_{kernel[7]}"


Using integer indices to access tuple elements (e.g., kernel[0], kernel[4]) makes the code hard to read and brittle to changes in the tuple structure. This pattern is also used elsewhere, such as k[3] on line 905.

To improve readability and maintainability, I recommend using a typing.NamedTuple or dataclasses.dataclass to represent the kernel information. This would allow accessing fields by name (e.g., kernel.name, kernel.copy_kind), making the code self-documenting and more robust.

For example, you could define a NamedTuple within bench_gpu_time_with_cupti:

from typing import NamedTuple, Any class KernelInfo(NamedTuple): name: str start: float end: float correlation_id: int copy_kind: int bytes: int value: int kind: Any

Then collect_kernel_info would return a KernelInfo instance, and this function could be rewritten as:

def generate_kernel_string(kernel: KernelInfo): # No start, end, correlation_id is considered in the kernel string return f"{kernel.name}_{kernel.copy_kind}_{kernel.bytes}_{kernel.value}_{kernel.kind}"

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

flashinfer/testing/utils.py (1)
768-778: Update type hints for the new kernel tuple structure.

The collect_kernel_info function returns an 8-element tuple, but the type hints on line 782 and line 865 still reference the old 4-element structure list[tuple[str, float, float, int]].

Consider defining a NamedTuple or updating the type hints for clarity:
+from typing import Tuple, Any, NamedTuple
+
+class KernelInfo(NamedTuple):
+    name: str
+    start: int
+    end: int
+    correlation_id: int
+    copy_kind: int
+    bytes: int
+    value: int
+    kind: Any  # cupti.ActivityKind
Or at minimum, update the inline type hints:
-    kernels: list[tuple[str, float, float, int]],
+    kernels: list[tuple[str, int, int, int, int, int, int, Any]],
-    kernels: list[tuple[str, float, float, int]] = []
+    kernels: list[tuple[str, int, int, int, int, int, int, Any]] = []

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0355f7 and c9f535c.

📒 Files selected for processing (1)

flashinfer/testing/utils.py (4 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (6)

flashinfer/testing/utils.py (6)

742-766: LGTM!

The helper functions correctly extract metadata based on activity kind. The lack of an explicit else clause in set_kernel_name is acceptable since collect_kernel_info is only called within func_buffer_completed where the activity kind is already validated.

786-792: LGTM!

The expanded condition correctly handles MEMCPY and MEMSET activities alongside CONCURRENT_KERNEL, and consistently uses collect_kernel_info for all three types.

870-871: LGTM!

MEMCPY and MEMSET activity kinds are correctly enabled before measurement.

889-890: LGTM!

MEMCPY and MEMSET activity kinds are correctly disabled after measurement, matching the enable calls.

893-895: Verify that including bytes in the kernel identifier is intentional.

Including kernel[5] (bytes) in the identifier means that MEMCPY/MEMSET operations with different sizes will be treated as different kernels. This will cause the consistency check on line 913-916 to fail if the benchmarked function performs memory operations of varying sizes across iterations.

If this is intentional (to ensure deterministic behavior), the current implementation is correct. If varying memory operation sizes are expected, consider excluding bytes from the identifier.

908-916: LGTM!

The consistency check correctly uses generate_kernel_string to compare kernel activities across iterations, ensuring reproducible benchmarking.

yzh119

LGTM

add memcpy and memset

c9f535c

nv-yunzheq requested review from aleozlx, bkryu, cyx-6, jimmyzho and yzh119 as code owners December 16, 2025 02:09

gemini-code-assist bot reviewed Dec 16, 2025

View reviewed changes

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

update defination of list

7b1b2e5

yzh119 approved these changes Dec 16, 2025

View reviewed changes

bkryu approved these changes Dec 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add memcpy and memset to CUPTI timing method #2223

feat: add memcpy and memset to CUPTI timing method #2223

Uh oh!

nv-yunzheq commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 16, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

gemini-code-assist bot commented Dec 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Dec 16, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

yzh119 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add memcpy and memset to CUPTI timing method #2223

Are you sure you want to change the base?

feat: add memcpy and memset to CUPTI timing method #2223

Uh oh!

Conversation

nv-yunzheq commented Dec 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot commented Dec 16, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nv-yunzheq commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 16, 2025 •

edited

Loading