Remove the stale MACA fusedmoe xfail and synchronize cython multi-stream JIT coverage#47
Remove the stale MACA fusedmoe xfail and synchronize cython multi-stream JIT coverage#47VitalyAnkh wants to merge 2 commits into
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
The previous MACA test failure was unrelated to the fusedmoe xfail removal itself. The failing case was , and the root cause was that the preceding multi-stream coverage launched work on auxiliary streams without waiting for completion before the next test reused GPU state. This update keeps the coverage intact and adds an explicit stream synchronization at the end of the helper so the later dynamic-shape case no longer inherits outstanding asynchronous work. |
|
The previous MACA test failure was unrelated to the fusedmoe xfail removal itself. The failing case was This update keeps the coverage intact and adds an explicit stream synchronization at the end of the helper so the later dynamic-shape case no longer inherits outstanding asynchronous work. |
| stream = torch.cuda.Stream() | ||
| streams = [torch.cuda.Stream() for _ in range(4)] | ||
| for stream in streams: | ||
| with torch.cuda.stream(stream): |
There was a problem hiding this comment.
there is an implicit syncthread in with context, is the explicit sync required?
| # side streams to finish before the test releases their tensors or the next | ||
| # test allocates new buffers on the default stream. | ||
| for stream in streams: | ||
| stream.synchronize() |
There was a problem hiding this comment.
why not wittern follow matmul_kernel directly?
Problem
The MACA fusedmoe test is still marked
xfail, even though it now passes on the currentdevbaseline. That stale marker turns a normal success into anXPASSand makes the MACA test suite harder to read.While validating this cleanup on the upstream self-hosted MACA runner, the CI also exposed an unrelated instability in
testing/python/jit/test_tilelang_jit_gemm_cython.py: the multi-stream coverage launched work on auxiliary streams but did not wait for those streams to finish before the following dynamic-shape test reused GPU state.What this PR changes
xfailmarker fromexamples/maca/fusedmoe/test_example_fusedmoe.pyrun_cython_kernel_multi_stream()intesting/python/jit/test_tilelang_jit_gemm_cython.pySolution
The fusedmoe change lets the test report its true outcome again.
The Cython test change keeps the existing multi-stream coverage, but makes the helper wait for the side streams it launches. This is the minimal fix: the kernel launches remain asynchronous, yet the test no longer leaves in-flight work behind for the next case.
Alternatives considered
One option was to keep the stale
xfailuntil a broader cleanup pass. That would have preserved misleading test output for no technical benefit.Another option was to retry CI and treat the Cython failure as an unrelated flake. That would have left a real ordering bug in the test helper, and the same runner could fail again on the next PR.
A cleaner review boundary would be to submit the JIT test fix separately. I did not do that here because the failure blocked validation of this PR on the upstream MACA runner and the fix is small, local, and directly tied to getting this branch green.
Verification
python -m pytest -q -rX examples/maca/fusedmoe/test_example_fusedmoe.pyonorigin/dev->1 xpassedpython -m pytest -q examples/maca/fusedmoe/test_example_fusedmoe.pyon this branch ->1 passedpre-commit run --files examples/maca/fusedmoe/test_example_fusedmoe.pypre-commit run --files testing/python/jit/test_tilelang_jit_gemm_cython.pytesting/python/jit/test_tilelang_jit_gemm_cython.pyagainst the builtdevenvironment, including the CI ordertest_cython_kernel_multi_stream()->test_cython_dynamic_shape()test_cython_kernel_multi_stream()->test_cython_dynamic_shape()20 times without reproducing the MACA mismatch