-
Notifications
You must be signed in to change notification settings - Fork 596
feat: add memcpy and memset to CUPTI timing method #2223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughExtend CUPTI-based GPU benchmarking to collect and classify MEMCPY and MEMSET activities alongside CONCURRENT_KERNEL, add helpers to extract standardized kernel metadata, and produce stable kernel identifiers for consistent per-iteration comparisons. Changes
Estimated code review effortπ― 3 (Moderate) | β±οΈ ~20 minutes
Poem
Pre-merge checks and finishing touchesβ Failed checks (2 warnings)
β Passed checks (1 passed)
β¨ Finishing touches
π§ͺ Generate unit tests (beta)
π Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro π Files selected for processing (1)
β° Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
π Additional comments (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @nv-yunzheq, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the existing CUPTI-based profiling utility by extending its capability to capture and report Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with π and π on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances the CUPTI-based timing utility by including memcpy and memset activities. The changes are logical and correctly enable, collect, and disable these new activity types. The changes look good and align with the goal. I have a couple of suggestions to improve code quality: one regarding an incorrect type hint and another to improve maintainability by replacing magic numbers with a structured data type like NamedTuple.
| def generate_kernel_string(kernel): | ||
| # No start, end, correlation_id is considered in the kernel string | ||
| return f"{kernel[0]}_{kernel[4]}_{kernel[5]}_{kernel[6]}_{kernel[7]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using integer indices to access tuple elements (e.g., kernel[0], kernel[4]) makes the code hard to read and brittle to changes in the tuple structure. This pattern is also used elsewhere, such as k[3] on line 905.
To improve readability and maintainability, I recommend using a typing.NamedTuple or dataclasses.dataclass to represent the kernel information. This would allow accessing fields by name (e.g., kernel.name, kernel.copy_kind), making the code self-documenting and more robust.
For example, you could define a NamedTuple within bench_gpu_time_with_cupti:
from typing import NamedTuple, Any
class KernelInfo(NamedTuple):
name: str
start: float
end: float
correlation_id: int
copy_kind: int
bytes: int
value: int
kind: AnyThen collect_kernel_info would return a KernelInfo instance, and this function could be rewritten as:
def generate_kernel_string(kernel: KernelInfo):
# No start, end, correlation_id is considered in the kernel string
return f"{kernel.name}_{kernel.copy_kind}_{kernel.bytes}_{kernel.value}_{kernel.kind}"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
π§Ή Nitpick comments (1)
flashinfer/testing/utils.py (1)
768-778: Update type hints for the new kernel tuple structure.The
collect_kernel_infofunction returns an 8-element tuple, but the type hints on line 782 and line 865 still reference the old 4-element structurelist[tuple[str, float, float, int]].Consider defining a
NamedTupleor updating the type hints for clarity:+from typing import Tuple, Any, NamedTuple + +class KernelInfo(NamedTuple): + name: str + start: int + end: int + correlation_id: int + copy_kind: int + bytes: int + value: int + kind: Any # cupti.ActivityKindOr at minimum, update the inline type hints:
- kernels: list[tuple[str, float, float, int]], + kernels: list[tuple[str, int, int, int, int, int, int, Any]],- kernels: list[tuple[str, float, float, int]] = [] + kernels: list[tuple[str, int, int, int, int, int, int, Any]] = []
π Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
π Files selected for processing (1)
flashinfer/testing/utils.py(4 hunks)
β° Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
π Additional comments (6)
flashinfer/testing/utils.py (6)
742-766: LGTM!The helper functions correctly extract metadata based on activity kind. The lack of an explicit
elseclause inset_kernel_nameis acceptable sincecollect_kernel_infois only called withinfunc_buffer_completedwhere the activity kind is already validated.
786-792: LGTM!The expanded condition correctly handles MEMCPY and MEMSET activities alongside CONCURRENT_KERNEL, and consistently uses
collect_kernel_infofor all three types.
870-871: LGTM!MEMCPY and MEMSET activity kinds are correctly enabled before measurement.
889-890: LGTM!MEMCPY and MEMSET activity kinds are correctly disabled after measurement, matching the enable calls.
893-895: Verify that includingbytesin the kernel identifier is intentional.Including
kernel[5](bytes) in the identifier means that MEMCPY/MEMSET operations with different sizes will be treated as different kernels. This will cause the consistency check on line 913-916 to fail if the benchmarked function performs memory operations of varying sizes across iterations.If this is intentional (to ensure deterministic behavior), the current implementation is correct. If varying memory operation sizes are expected, consider excluding
bytesfrom the identifier.
908-916: LGTM!The consistency check correctly uses
generate_kernel_stringto compare kernel activities across iterations, ensuring reproducible benchmarking.
yzh119
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
π Description
π Related Issues
π Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
β Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.π§ͺ Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
βοΈ Tip: You can customize this high-level summary in your review settings.