[Fix][Kernel] Migrate deepseek_dsa_decode to tilelang 5d729eee API#983
Draft
stelladuyx wants to merge 2 commits intotile-ai:mainfrom
Draft
[Fix][Kernel] Migrate deepseek_dsa_decode to tilelang 5d729eee API#983stelladuyx wants to merge 2 commits intotile-ai:mainfrom
stelladuyx wants to merge 2 commits intotile-ai:mainfrom
Conversation
Fixes tile-ai#979. Tested with tilelang commit 5d729eeebca3ea776373a2918e3945d667bd1c7d (2026-04-13, "[Refactor] Remove GEMM v1 and promote gemm_py to be the canonical gemm op (#2033)"). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request refactors the deepseek_dsa_decode kernel to use optimized TVM intrinsics, including T.tma_copy for memory transfers, T.wgmma_gemm for matrix multiplication, and T.ptx_cp_async for asynchronous copies. It also updates indices_local to a scalar variable and introduces explicit max-merge loops after reductions. The review feedback highlights that the clear=False parameter in T.reduce_max might be redundant or misleading following the addition of explicit merge logic.
… max-merge clear=False told reduce_max to accumulate into the existing m_i value, but the subsequent explicit T.max loop already handles the merge. Using both is contradictory. Switch to clear=True (reduce into a fresh value) so the explicit T.max loop is the sole merge step, matching the reference implementation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#979
Migrates deepseek_dsa_decode.py to be compatible with tilelang commit 5d729eee (2026-04-13, "Remove GEMM v1 and promote gemm_py to be the canonical gemm op").
What changed
Test
Tested with tileopsenv — a copy of flashmlaenv with tilelang upgraded to 5d729eee.