Skip to content

[Fix][Kernel] Migrate deepseek_dsa_decode to tilelang 5d729eee API#983

Draft
stelladuyx wants to merge 2 commits intotile-ai:mainfrom
stelladuyx:new_tilelang_version
Draft

[Fix][Kernel] Migrate deepseek_dsa_decode to tilelang 5d729eee API#983
stelladuyx wants to merge 2 commits intotile-ai:mainfrom
stelladuyx:new_tilelang_version

Conversation

@stelladuyx
Copy link
Copy Markdown
Collaborator

#979

Migrates deepseek_dsa_decode.py to be compatible with tilelang commit 5d729eee (2026-04-13, "Remove GEMM v1 and promote gemm_py to be the canonical gemm op").

What changed

  • wg_wait removed from T.gemm; async WGMMA now uses explicit T.wgmma_gemm
  • Q loading switched to TMA for correct layout inference
  • Explicit cp.async intrinsic replaces the old async_scope annotation
  • Explicit max-merge required after reduce_max in new API

Test

Tested with tileopsenv — a copy of flashmlaenv with tilelang upgraded to 5d729eee.

Fixes tile-ai#979.

Tested with tilelang commit 5d729eeebca3ea776373a2918e3945d667bd1c7d
(2026-04-13, "[Refactor] Remove GEMM v1 and promote gemm_py to be the
canonical gemm op (#2033)").

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@stelladuyx stelladuyx requested a review from a team April 17, 2026 09:27
@stelladuyx stelladuyx self-assigned this Apr 17, 2026
@stelladuyx stelladuyx marked this pull request as draft April 17, 2026 09:28
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the deepseek_dsa_decode kernel to use optimized TVM intrinsics, including T.tma_copy for memory transfers, T.wgmma_gemm for matrix multiplication, and T.ptx_cp_async for asynchronous copies. It also updates indices_local to a scalar variable and introduces explicit max-merge loops after reductions. The review feedback highlights that the clear=False parameter in T.reduce_max might be redundant or misleading following the addition of explicit merge logic.

Comment thread tileops/kernels/attention/deepseek_dsa_decode.py Outdated
Comment thread tileops/kernels/attention/deepseek_dsa_decode.py Outdated
… max-merge

clear=False told reduce_max to accumulate into the existing m_i value,
but the subsequent explicit T.max loop already handles the merge.
Using both is contradictory. Switch to clear=True (reduce into a fresh
value) so the explicit T.max loop is the sole merge step, matching the
reference implementation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@stelladuyx stelladuyx changed the title Title: [Fix][Kernel] Migrate deepseek_dsa_decode to tilelang 5d729eee API [Fix][Kernel] Migrate deepseek_dsa_decode to tilelang 5d729eee API Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant