[MetaxGPU][testing] Add missing double2 ->e8m0x2 conversion in fp8.h#101
[MetaxGPU][testing] Add missing double2 ->e8m0x2 conversion in fp8.h#101xxxzh99 wants to merge 1 commit into
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
Warning Review limit reached
More reviews will be available in 26 minutes and 14 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds exponent-only E8M0 FP8 encode/decode helpers (scalar and packed x2) for float, bfloat16, and double in the MACA template header; removes the xfail marker from the FP8 vectorized-cast test; and advances the 3rdparty/tvm submodule pointer. ChangesE8M0 Conversion Functions
Third-party Submodule
🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/tl_templates/maca/maca_fp8.h (1)
669-674: ⚡ Quick winRedundant cast in double2 conversion.
Lines 670-671 cast
src.xandsrc.ytofloatbefore passing to__tl_cvt_double_to_e8m0, but that function already castsdoubletofloatinternally (line 665). The explicit casts are unnecessary.♻️ Proposed cleanup
TL_DEVICE __maca_fp8x2_storage_t __tl_cvt_double2_to_e8m0x2(const double2 src) { - unsigned char lo = __tl_cvt_double_to_e8m0(static_cast<float>(src.x)); - unsigned char hi = __tl_cvt_double_to_e8m0(static_cast<float>(src.y)); + unsigned char lo = __tl_cvt_double_to_e8m0(src.x); + unsigned char hi = __tl_cvt_double_to_e8m0(src.y); return static_cast<__maca_fp8x2_storage_t>(lo) | (static_cast<__maca_fp8x2_storage_t>(hi) << 8); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/tl_templates/maca/maca_fp8.h` around lines 669 - 674, The two explicit static_cast<float> calls in __tl_cvt_double2_to_e8m0x2 are redundant because __tl_cvt_double_to_e8m0 already handles double-to-float conversion; remove the casts around src.x and src.y and pass them directly to __tl_cvt_double_to_e8m0 so the function reads __tl_cvt_double_to_e8m0(src.x) and __tl_cvt_double_to_e8m0(src.y), leaving the rest of __tl_cvt_double2_to_e8m0x2 unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/tl_templates/maca/maca_fp8.h`:
- Around line 645-653: In __tl_cvt_float_to_e8m0 change the rounding and
overflow guard: replace the current "if (mantissa > 0 && exponent < 0xFF) {
exponent += 1; }" with a round-half-up test using the halfway mantissa threshold
(mantissa >= 0x400000) and only increment when exponent < 0xFE to avoid
promoting large finite exponents to 0xFF (infinity/NaN); i.e., test mantissa >=
0x400000 and exponent < 0xFE before doing exponent += 1.
---
Nitpick comments:
In `@src/tl_templates/maca/maca_fp8.h`:
- Around line 669-674: The two explicit static_cast<float> calls in
__tl_cvt_double2_to_e8m0x2 are redundant because __tl_cvt_double_to_e8m0 already
handles double-to-float conversion; remove the casts around src.x and src.y and
pass them directly to __tl_cvt_double_to_e8m0 so the function reads
__tl_cvt_double_to_e8m0(src.x) and __tl_cvt_double_to_e8m0(src.y), leaving the
rest of __tl_cvt_double2_to_e8m0x2 unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 06f0ca5b-f1fc-4ae9-974e-bfe911c3efef
📒 Files selected for processing (2)
src/tl_templates/maca/maca_fp8.htesting/maca/language/test_tilelang_language_vectorized_cast.py
💤 Files with no reviewable changes (1)
- testing/maca/language/test_tilelang_language_vectorized_cast.py
| TL_DEVICE unsigned char __tl_cvt_float_to_e8m0(const float src) { | ||
| unsigned int bits = *reinterpret_cast<const unsigned int *>(&src); | ||
| unsigned int exponent = (bits >> 23) & 0xFF; | ||
| unsigned int mantissa = bits & 0x7FFFFF; | ||
| if (mantissa > 0 && exponent < 0xFF) { | ||
| exponent += 1; | ||
| } | ||
| return static_cast<unsigned char>(exponent); | ||
| } |
There was a problem hiding this comment.
Rounding logic rounds up too aggressively and can overflow to infinity.
Lines 649-651 increment the exponent whenever mantissa is non-zero. This causes two issues:
-
Overly aggressive rounding: The condition
mantissa > 0rounds up even tiny mantissa values. For proper round-half-up in logarithmic (exponent-only) space, use the halfway threshold:mantissa >= 0x400000. -
Overflow to infinity: When
exponent == 0xFE(254) and mantissa is non-zero, incrementing produces0xFF(255), which represents infinity or NaN in IEEE-754. Large finite values incorrectly become infinity after quantization.
🛠️ Proposed fix
TL_DEVICE unsigned char __tl_cvt_float_to_e8m0(const float src) {
unsigned int bits = *reinterpret_cast<const unsigned int *>(&src);
unsigned int exponent = (bits >> 23) & 0xFF;
unsigned int mantissa = bits & 0x7FFFFF;
- if (mantissa > 0 && exponent < 0xFF) {
+ // Round half up: increment exponent if mantissa >= 0.5, clamping to avoid overflow to infinity
+ if (mantissa >= 0x400000 && exponent < 0xFE) {
exponent += 1;
}
return static_cast<unsigned char>(exponent);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| TL_DEVICE unsigned char __tl_cvt_float_to_e8m0(const float src) { | |
| unsigned int bits = *reinterpret_cast<const unsigned int *>(&src); | |
| unsigned int exponent = (bits >> 23) & 0xFF; | |
| unsigned int mantissa = bits & 0x7FFFFF; | |
| if (mantissa > 0 && exponent < 0xFF) { | |
| exponent += 1; | |
| } | |
| return static_cast<unsigned char>(exponent); | |
| } | |
| TL_DEVICE unsigned char __tl_cvt_float_to_e8m0(const float src) { | |
| unsigned int bits = *reinterpret_cast<const unsigned int *>(&src); | |
| unsigned int exponent = (bits >> 23) & 0xFF; | |
| unsigned int mantissa = bits & 0x7FFFFF; | |
| // Round half up: increment exponent if mantissa >= 0.5, clamping to avoid overflow to infinity | |
| if (mantissa >= 0x400000 && exponent < 0xFE) { | |
| exponent += 1; | |
| } | |
| return static_cast<unsigned char>(exponent); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/tl_templates/maca/maca_fp8.h` around lines 645 - 653, In
__tl_cvt_float_to_e8m0 change the rounding and overflow guard: replace the
current "if (mantissa > 0 && exponent < 0xFF) { exponent += 1; }" with a
round-half-up test using the halfway mantissa threshold (mantissa >= 0x400000)
and only increment when exponent < 0xFE to avoid promoting large finite
exponents to 0xFF (infinity/NaN); i.e., test mantissa >= 0x400000 and exponent <
0xFE before doing exponent += 1.
85cdc15 to
f1ccc05
Compare
Add complete bidirectional conversion kernels between FP8-E8M0 and float/bfloat16 scalar/vector types for MACA device. These TL_DEVICE helper routines support quantization/dequantization used in low-precision grouped-GEMM and MoE inference, covering scalar, dual-element vector and double-precision fallback paths with proper exponent rounding rules to guarantee numerical correctness during type cast.
Summary by CodeRabbit
New Features
Bug Fixes