Adapt DeepSeek V32 examples to MACA-safe barriers, head tiling, and fallback paths#36
Open
VitalyAnkh wants to merge 3 commits into
Open
Adapt DeepSeek V32 examples to MACA-safe barriers, head tiling, and fallback paths#36VitalyAnkh wants to merge 3 commits into
VitalyAnkh wants to merge 3 commits into
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
da28d0b to
e144b9b
Compare
e144b9b to
5816c1c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #30.
Review note
This branch now carries only the shared MACA backend prerequisite from #33 plus the DeepSeek V32 changes below. It no longer includes the unrelated example updates from #34 or #35.
Problem
The DeepSeek V32 examples assumed CUDA-specific execution behaviour in several places, including barrier handling, head partitioning, TMA-oriented forward paths, and vector-atomic usage in backward kernels.
What this PR changes
topk_selectorwith a MACA-safe reset strategySolution
The PR keeps the DeepSeek V32 examples intact at the algorithmic level, but rewrites the execution assumptions that were specific to CUDA. The MACA path now partitions heads according to MACA-safe tile sizes, clears the histogram state fully, and avoids unsupported synchronization or atomic behaviour.
Alternatives considered
One option was to bypass the DeepSeek V32 cases entirely on MACA. That would have been expedient, but it would also have left a substantial portion of the example suite unexercised. Another was to preserve the existing kernels and add narrow guards around the observed failures. That approach would have been fragile because the failures shared a broader root cause: CUDA-specific execution assumptions embedded in the example kernels.
Verification
python -m pytest -q examples/maca/deepseek_v32/test_tilelang_example_deepseek_v32.py