examples/deepseek_v32: add runnable precision comparison to act_quant.py#10
Merged
Merged
Conversation
Agent-Logs-Url: https://github.com/CeleNewYear/tilelang-ascend-dev/sessions/3c139960-1102-4131-88ad-316cc7b00772 Co-authored-by: ChangChengShouWang <188188721+ChangChengShouWang@users.noreply.github.com>
Copilot created this pull request from a session on behalf of
ChangChengShouWang
April 20, 2026 11:41
View session
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
act_quant.pywas a kernel-only file with no way to validate correctness. This adds a self-contained precision test runnable viapython act_quant.py.Changes
_fast_round_scale_ref— pure-PyTorch replica of the kernel's IEEE 754 bit-manipulation: extracts float32 exponent bits, computesceil(log2), reconstructs a power-of-two via(exp+127)<<23. Matches kernel behavior exactly for both exact and non-exact powers of two.act_quant_torch_ref— float32 reference quantisation: per-rowabsmaxper N-group (clamped at1e-4, matching the kernel), derives scale, clampsx/sto FP8[-448, 448].act_quant— kernel wrapper that pre-allocatesYastorch.float8_e4m3fnandSasfloat32before dispatching to the compiled tilelang kernel.run_test_case/run_test— exercises four(m, n, round_scale)configurations with two checks:S: tight tolerance (rtol=1e-3, atol=1e-4)y·s ≈ x: FP8-appropriate tolerance (rtol=0.2, atol=0.1) accounting for the 3-bit mantissa quantisation step and scale rounding