-
Notifications
You must be signed in to change notification settings - Fork 594
benchmark: Make use_cupti the default in microbenchmarks. #2180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe benchmark script is updated to deprecate the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @bkryu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request streamlines and improves the accuracy of microbenchmark timing within the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request makes CUPTI the default timing mechanism for microbenchmarks, which is a good improvement for accuracy. A new --use_cuda_events flag is introduced to revert to the old behavior, and the --use_cupti flag is deprecated with a warning. The implementation is logical and correct. I have one minor suggestion to improve code clarity by removing a redundant comment.
| print( | ||
| "[WARNING] --use_cupti is deprecated and will be removed in a future release. CUPTI is now enabled by default." | ||
| ) | ||
| # use_cupti is deprecated and will be removed in a future release. CUPTI is now enabled by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
benchmarks/flashinfer_benchmark.py (2)
79-89: Clarify deprecated--use_cuptihelp to describe it as a no-op and point to--use_cuda_events.Right now the help says
[DEPRECATED] Use CUPTI for timing GPU kernels. This is now the default behavior., but the flag no longer changes behavior; it only triggers a warning. It’d be clearer to state that it’s a no-op and that users should switch to--use_cuda_eventswhen they want CUDA events.For example:
- parser.add_argument( - "--use_cupti", - action="store_true", - default=False, - help="[DEPRECATED] Use CUPTI for timing GPU kernels. This is now the default behavior.", - ) + parser.add_argument( + "--use_cupti", + action="store_true", + default=False, + help=( + "[DEPRECATED] No-op: CUPTI timing is enabled by default. " + "Use --use_cuda_events to time with CUDA events instead." + ), + )This makes it explicit that the flag is deprecated, does not affect behavior, and what the new preferred flag is.
165-173: Deprecation logic matches the new semantics; consider documenting precedence and sending warnings to stderr.The resolution
if args.use_cupti: print("[WARNING] --use_cupti is deprecated ...") args.use_cupti = not args.use_cuda_eventscorrectly implements:
- CUPTI on by default (no flags),
--use_cuda_eventsdisabling CUPTI,--use_cuptias a deprecated, behavior‑neutral flag.Two small polish suggestions:
When both flags are passed, you print the deprecation warning but end up running with CUDA events (
use_cupti = False). That’s sensible, but it might be worth mentioning in the help or warning that--use_cuda_eventstakes precedence if both are provided.Consider directing the warning to stderr to avoid polluting stdout for scripts that parse benchmark output:
- if args.use_cupti: - print( - "[WARNING] --use_cupti is deprecated and will be removed in a future release. CUPTI is now enabled by default." - ) + if args.use_cupti: + print( + "[WARNING] --use_cupti is deprecated and will be removed in a future release. " + "CUPTI is now enabled by default; use --use_cuda_events to opt out.", + file=sys.stderr, + )Overall, the behavioral change looks aligned with the PR description.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
benchmarks/flashinfer_benchmark.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
yzh119
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvement, as far as I remember cupti only works for cuda 13.* right, where should we inform user about this limitation?
This is actually a good point. We have a logic to fall back to using CUDA events in For example in a CUDA 12.9 container on B200: |
|
Thanks for clarification, it make sense to me. |
📌 Description
Currently the microbenchmarks in
flashinfer_benchmark.pyuses CUDA events to measure time. However, using CUPTI provides benefits in (a) nsys-matching accuracy; and (b) L2 cache flush implemented in the benchmark script.The PR
--use_cuptithe default behavior--use_cuptinow does nothing and prints a deprecation warning.--use_cuda_eventsflag is now added for users who would like to use CUDA events.Example outputs on 5090:
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
New Features
--use_cuda_eventsflag for GPU kernel timing.Chores
--use_cuptiflag; displays warning when set.✏️ Tip: You can customize this high-level summary in your review settings.