Skip to content

Conversation

@samuellees
Copy link
Contributor

@samuellees samuellees commented Dec 3, 2025

📌 Description

It's so important to have dtype check to ensure API callers use FlashInfer correctly. And warning or errors should be complained if the parameters don't match kernel's requirement.

Or this will bring a lot of debugging efforts from framework side. (eg. PR13761, PR14350, PR14135)

FlashInfer is a great product, and we really hope FlashInfer be greater. Please pay attention to this kind of checks. Thanks a lot. cc @yzh119

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Bug Fixes
    • Strengthened runtime validation for a specific routing configuration to enforce that routing logits use float32 (and apply a sensible float32 default when missing). This prevents type-related runtime errors, improves stability for affected model execution paths, and does not change functional behavior beyond validation.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Warning

Rate limit exceeded

@samuellees has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 95cdb91 and 41d77cd.

📒 Files selected for processing (1)
  • csrc/trtllm_fused_moe_kernel_launcher.cu (1 hunks)

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Added guarded runtime checks in multiple MoE launcher code paths: when routing_method_type == DeepSeekV3, routing_logits must be float32 (if absent, treated as float32). Checks were inserted across launcher variants without other control-flow or logic changes.

Changes

Cohort / File(s) Summary
DeepSeekV3 routing dtype checks
csrc/trtllm_fused_moe_kernel_launcher.cu
Added runtime ICHECK validations that require routing_logits dtype == float32 when routing_method_type == DeepSeekV3. The guards were inserted in multiple launcher check routines (common and variant-specific) and default to float32 if routing_logits is absent.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify each inserted ICHECK is correctly scoped to the DeepSeekV3 branch and does not affect other routing methods.
  • Confirm the defaulting behavior when routing_logits is absent is intentional and documented in the error message.
  • Check consistency of dtype comparisons across the different launcher variants and potential compile warnings.

Poem

I’m a rabbit in code, hopping through logs,
I sniff the logits and check all the cogs,
When DeepSeekV3 asks for floats I nod,
A tiny ICHECK: careful, not odd,
Hopping safe through types, a tidy little blog 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive The description provides context for why dtype checks are needed and references related issues, but most template sections (Related Issues, Pre-commit Checks, and Tests) remain incomplete with only the Description section meaningfully filled out. Complete the 'Related Issues' section with specific issue links, confirm pre-commit setup steps were followed by checking the boxes, and document whether tests were added or updated for the dtype validation changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding data type checks for DeepSeek FP4 MoE kernel validation.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @samuellees, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness of the FlashInfer library by adding a critical data type check for the routing_logits parameter. Specifically, when utilizing the DeepSeekV3 routing method, the system now verifies that routing_logits is a float32 data type. This proactive validation is designed to prevent common API misuse, streamline debugging processes for users, and ensure the stability and correctness of operations involving DeepSeekV3 models.

Highlights

  • Data Type Validation: Introduced a new data type check for the routing_logits parameter within the Fp8BlockScaleLauncher class.
  • DeepSeekV3 Specific Check: The added check specifically applies when the RoutingMethodType is DeepSeekV3, ensuring that routing_logits is of dl_float32 type.
  • Improved API Robustness: This validation aims to prevent incorrect API usage, reduce debugging efforts for framework developers, and ensure the correct functioning of FlashInfer with DeepSeekV3 models.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a data type check for routing_logits when using the DeepSeekV3 routing method with the Fp8BlockScaleLauncher. This improves the robustness of the API. However, the added code contains a syntax error that will prevent compilation. I've provided a comment with a suggested fix for this issue. Note that while the PR title mentions fp4, the change is applied to the fp8 launcher.

if (static_cast<RoutingMethodType>(routing_method_type) ==
RoutingMethodType::DeepSeekV3)
{
TVM_FFI_ICHECK_EQ(routing_logits.dtype(), dl_float32) << "routing_logits must be float for DeepSeekV3 Routing method.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The routing_logits member is of type Optional<TensorView>, so to access its dtype, you need to use .value().dtype(). The current code routing_logits.dtype() will not compile.

For better code organization, you might also consider moving this check to the check_routing() method (around line 754), as it's a check related to routing parameters and other routing_method_type checks are already there.

      TVM_FFI_ICHECK_EQ(routing_logits.value().dtype(), dl_float32) << "routing_logits must be float for DeepSeekV3 Routing method.";

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 442dec9 and 72fb317.

📒 Files selected for processing (1)
  • csrc/trtllm_fused_moe_kernel_launcher.cu (1 hunks)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
csrc/trtllm_fused_moe_kernel_launcher.cu (1)

838-843: Remove this redundant check — already validated in check_routing().

Per a past review comment, this check was intended to be moved to check_routing(), not duplicated. Since the dtype validation now exists in check_routing_common() (lines 215-220) which is called before check_moe(), this block should be removed.

   void check_moe() const override {
     FusedMoeLauncher::check_moe_common();
 
-    if (static_cast<RoutingMethodType>(routing_method_type) == RoutingMethodType::DeepSeekV3) {
-      auto const routing_logits_dtype =
-          routing_logits.has_value() ? routing_logits.value().dtype() : dl_float32;
-      TVM_FFI_ICHECK_EQ(routing_logits_dtype, dl_float32)
-          << "routing_logits must be float for DeepSeekV3 Routing method.";
-    }
-
     TVM_FFI_ICHECK_EQ(hidden_states.dtype(), dl_float8_e4m3fn) << "hidden_states must be fp8.";
🧹 Nitpick comments (3)
csrc/trtllm_fused_moe_kernel_launcher.cu (3)

440-445: Duplicate check — already handled by check_routing_common().

The call to FusedMoeLauncher::check_routing_common() on line 438 already performs this exact DeepSeekV3 dtype validation. This block can be removed to avoid redundancy.

   void check_routing() const override {
     FusedMoeLauncher::check_routing_common();
 
-    if (static_cast<RoutingMethodType>(routing_method_type) == RoutingMethodType::DeepSeekV3) {
-      auto const routing_logits_dtype =
-          routing_logits.has_value() ? routing_logits.value().dtype() : dl_float32;
-      TVM_FFI_ICHECK_EQ(routing_logits_dtype, dl_float32)
-          << "routing_logits must be float for DeepSeekV3 Routing method.";
-    }
     // TODO n_group, topk_group validation?
   }

800-806: Duplicate check — check_routing_common() already validates this.

Line 768 invokes check_routing_common() which performs this same DeepSeekV3 dtype validation. Consider removing this block to reduce redundancy.

     TVM_FFI_ICHECK_LE(args->local_num_experts + args->local_expert_offset, args->num_experts)
         << "num_experts must be greater or equal to local_num_experts + local_expert_offset";
-
-    if (static_cast<RoutingMethodType>(routing_method_type) == RoutingMethodType::DeepSeekV3) {
-      auto const routing_logits_dtype =
-          routing_logits.has_value() ? routing_logits.value().dtype() : dl_float32;
-      TVM_FFI_ICHECK_EQ(routing_logits_dtype, dl_float32)
-          << "routing_logits must be float for DeepSeekV3 Routing method.";
-    }
   }

1038-1043: Duplicate check — check_routing_common() already handles this.

Since FusedMoeLauncher::check_routing_common() is called on line 1037 and already includes the DeepSeekV3 dtype validation, this block is redundant.

   void check_routing() const override {
     // First call base class common routing checks
     FusedMoeLauncher::check_routing_common();
-    if (static_cast<RoutingMethodType>(routing_method_type) == RoutingMethodType::DeepSeekV3) {
-      auto const routing_logits_dtype =
-          routing_logits.has_value() ? routing_logits.value().dtype() : dl_float32;
-      TVM_FFI_ICHECK_EQ(routing_logits_dtype, dl_float32)
-          << "routing_logits must be float for DeepSeekV3 Routing method.";
-    }
   }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 196544c and 95cdb91.

📒 Files selected for processing (1)
  • csrc/trtllm_fused_moe_kernel_launcher.cu (5 hunks)
🔇 Additional comments (1)
csrc/trtllm_fused_moe_kernel_launcher.cu (1)

215-220: Correct placement for the dtype check.

Adding the DeepSeekV3 dtype validation in check_routing_common() ensures all derived launchers inherit this check automatically. The conditional fallback to dl_float32 when routing_logits is absent is also appropriate.

@yzh119
Copy link
Collaborator

yzh119 commented Dec 6, 2025

/bot run

@flashinfer-bot
Copy link
Collaborator

GitLab MR !180 has been created, and the CI pipeline #39712770 is currently running. I'll report back once the pipeline job completes.

Copy link
Collaborator

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, cc @jiahanc for viz.

@yzh119 yzh119 merged commit 70bc2b5 into flashinfer-ai:main Dec 6, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants