Skip to content

Conversation

@nv-yunzheq
Copy link
Contributor

@nv-yunzheq nv-yunzheq commented Dec 16, 2025

πŸ“Œ Description

πŸ” Related Issues

πŸš€ Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

βœ… Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

πŸ§ͺ Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Chores
    • Enhanced GPU benchmarking to also capture memory copy and set operations in addition to kernel activity, yielding more comprehensive performance metrics.
    • Improved and stabilized kernel identification across iterations to make benchmark comparisons more reliable and consistent.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Extend CUPTI-based GPU benchmarking to collect and classify MEMCPY and MEMSET activities alongside CONCURRENT_KERNEL, add helpers to extract standardized kernel metadata, and produce stable kernel identifiers for consistent per-iteration comparisons.

Changes

Cohort / File(s) Summary
GPU benchmarking utility
flashinfer/testing/utils.py
Added CUPTI activity helpers (set_kernel_name, get_bytes, get_copy_kind, get_value, collect_kernel_info) and generate_kernel_string. Expanded kernel records from 4β†’8 fields, enabled MEMCPY/MEMSET measurement alongside CONCURRENT_KERNEL, updated per-iteration processing and lifecycle (enable/disable) handling, and switched comparisons to string-based kernel identifiers.
Manifest/config
requirements.txt, pyproject.toml
Supporting configuration updates referenced by the PR.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Inspect helpers for correct handling of CONCURRENT_KERNEL, MEMCPY, MEMSET activity types.
  • Verify generate_kernel_string produces stable, unique identifiers (timing/correlation excluded).
  • Confirm CUPTI enable/disable calls and lifecycle pairing are correct.
  • Review changes to kernel tuple structure and all callsites that unpack or expect kernel fields.

Poem

I hop through traces, ears in tune, πŸ‡
Counting copies, sets, and kernels bright,
CUPTI whispers each memory rune,
I stitch their names by soft moonlight,
Benchmarks hop steady into the night. ✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description consists entirely of the template with no actual details filled in. All descriptive sections (Description, Related Issues, Reviewer Notes) are empty or contain only placeholder comments, making it impossible to understand the motivation and context for the changes. Fill in the Description section with details about what the changes accomplish and why they're needed, link any related issues, and optionally add reviewer notes to guide the review process.
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
βœ… Passed checks (1 passed)
Check name Status Explanation
Title check βœ… Passed The PR title clearly and specifically describes the main change: adding memcpy and memset support to the CUPTI timing method, which directly aligns with the changeset's addition of CUPTI kernel-activity helpers and extended MEMCPY/MEMSET activity tracking.
✨ Finishing touches
  • πŸ“ Generate docstrings
πŸ§ͺ Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

πŸ“œ Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between c9f535c and 7b1b2e5.

πŸ“’ Files selected for processing (1)
  • flashinfer/testing/utils.py (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Deploy Docs
πŸ”‡ Additional comments (4)
flashinfer/testing/utils.py (4)

742-779: Well-structured helper functions for CUPTI activity metadata extraction.

The helper functions appropriately handle different activity kinds (CONCURRENT_KERNEL, MEMCPY, MEMSET) and provide sensible defaults. The separation of concerns makes the code clear and maintainable.


786-792: Correctly extended to capture MEMCPY and MEMSET activities.

The activity filtering now properly includes memory operations alongside kernel activities, and correctly uses the new collect_kernel_info helper to gather metadata.


870-871: Properly enable and disable MEMCPY and MEMSET activity kinds.

The activity lifecycle management correctly enables MEMCPY and MEMSET before measurement and disables them after flushing, maintaining symmetry with the existing activity kinds.

Also applies to: 889-890


908-908: Correctly uses the new kernel string generator for consistency checks.

The application of generate_kernel_string properly creates stable identifiers for cross-iteration kernel comparison.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❀️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @nv-yunzheq, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the existing CUPTI-based profiling utility by extending its capability to capture and report memcpy and memset operations alongside regular kernel executions. This improvement provides a more complete picture of GPU workload by including memory transfer and initialization activities in the performance analysis, which is crucial for identifying potential bottlenecks related to data movement.

Highlights

  • Expanded CUPTI Activity Tracking: The CUPTI timing method now tracks MEMCPY and MEMSET activities in addition to CONCURRENT_KERNEL activities, providing a more comprehensive view of GPU operations.
  • Detailed Activity Information: New helper functions (get_bytes, get_copy_kind, get_value) have been introduced to extract specific details for MEMCPY and MEMSET operations, such as bytes transferred, copy kind, and memset value.
  • Unified Kernel Information Collection: A collect_kernel_info function standardizes the collection of activity details, ensuring that all relevant attributes, including the new MEMCPY and MEMSET specific data, are captured.
  • Enhanced Kernel String Generation: A generate_kernel_string function is added to create unique identifiers for kernels, incorporating MEMCPY and MEMSET specific attributes for better differentiation and analysis.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with πŸ‘ and πŸ‘Ž on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the CUPTI-based timing utility by including memcpy and memset activities. The changes are logical and correctly enable, collect, and disable these new activity types. The changes look good and align with the goal. I have a couple of suggestions to improve code quality: one regarding an incorrect type hint and another to improve maintainability by replacing magic numbers with a structured data type like NamedTuple.

Comment on lines +893 to +895
def generate_kernel_string(kernel):
# No start, end, correlation_id is considered in the kernel string
return f"{kernel[0]}_{kernel[4]}_{kernel[5]}_{kernel[6]}_{kernel[7]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using integer indices to access tuple elements (e.g., kernel[0], kernel[4]) makes the code hard to read and brittle to changes in the tuple structure. This pattern is also used elsewhere, such as k[3] on line 905.

To improve readability and maintainability, I recommend using a typing.NamedTuple or dataclasses.dataclass to represent the kernel information. This would allow accessing fields by name (e.g., kernel.name, kernel.copy_kind), making the code self-documenting and more robust.

For example, you could define a NamedTuple within bench_gpu_time_with_cupti:

from typing import NamedTuple, Any

class KernelInfo(NamedTuple):
    name: str
    start: float
    end: float
    correlation_id: int
    copy_kind: int
    bytes: int
    value: int
    kind: Any

Then collect_kernel_info would return a KernelInfo instance, and this function could be rewritten as:

def generate_kernel_string(kernel: KernelInfo):
    # No start, end, correlation_id is considered in the kernel string
    return f"{kernel.name}_{kernel.copy_kind}_{kernel.bytes}_{kernel.value}_{kernel.kind}"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
flashinfer/testing/utils.py (1)

768-778: Update type hints for the new kernel tuple structure.

The collect_kernel_info function returns an 8-element tuple, but the type hints on line 782 and line 865 still reference the old 4-element structure list[tuple[str, float, float, int]].

Consider defining a NamedTuple or updating the type hints for clarity:

+from typing import Tuple, Any, NamedTuple
+
+class KernelInfo(NamedTuple):
+    name: str
+    start: int
+    end: int
+    correlation_id: int
+    copy_kind: int
+    bytes: int
+    value: int
+    kind: Any  # cupti.ActivityKind

Or at minimum, update the inline type hints:

-    kernels: list[tuple[str, float, float, int]],
+    kernels: list[tuple[str, int, int, int, int, int, int, Any]],
-    kernels: list[tuple[str, float, float, int]] = []
+    kernels: list[tuple[str, int, int, int, int, int, int, Any]] = []
πŸ“œ Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between f0355f7 and c9f535c.

πŸ“’ Files selected for processing (1)
  • flashinfer/testing/utils.py (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Deploy Docs
πŸ”‡ Additional comments (6)
flashinfer/testing/utils.py (6)

742-766: LGTM!

The helper functions correctly extract metadata based on activity kind. The lack of an explicit else clause in set_kernel_name is acceptable since collect_kernel_info is only called within func_buffer_completed where the activity kind is already validated.


786-792: LGTM!

The expanded condition correctly handles MEMCPY and MEMSET activities alongside CONCURRENT_KERNEL, and consistently uses collect_kernel_info for all three types.


870-871: LGTM!

MEMCPY and MEMSET activity kinds are correctly enabled before measurement.


889-890: LGTM!

MEMCPY and MEMSET activity kinds are correctly disabled after measurement, matching the enable calls.


893-895: Verify that including bytes in the kernel identifier is intentional.

Including kernel[5] (bytes) in the identifier means that MEMCPY/MEMSET operations with different sizes will be treated as different kernels. This will cause the consistency check on line 913-916 to fail if the benchmarked function performs memory operations of varying sizes across iterations.

If this is intentional (to ensure deterministic behavior), the current implementation is correct. If varying memory operation sizes are expected, consider excluding bytes from the identifier.


908-916: LGTM!

The consistency check correctly uses generate_kernel_string to compare kernel activities across iterations, ensuring reproducible benchmarking.

Copy link
Collaborator

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants