Skip to content

Make low-precision MACA examples use portable emitters, launch shapes, and reference paths#34

Open
VitalyAnkh wants to merge 4 commits into
tile-ai:devfrom
VitalyAnkh:vitaly/maca-low-precision-example-portability
Open

Make low-precision MACA examples use portable emitters, launch shapes, and reference paths#34
VitalyAnkh wants to merge 4 commits into
tile-ai:devfrom
VitalyAnkh:vitaly/maca-low-precision-example-portability

Conversation

@VitalyAnkh

@VitalyAnkh VitalyAnkh commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator

Fixes #28.

Problem

Several low-precision MACA examples still fail after the backend is repaired because they assume CUDA or Hopper-specific emitters, launch shapes, and reference paths.

What this PR changes

  • selects MACA-safe launch configurations for the affected FP8 and dequantisation examples
  • routes intrinsic GEMM emission through the MACA tensor-core path when the target is MACA
  • adds portable reference implementations where CUDA-only APIs are unavailable in the MACA environment
  • simplifies and vectorises the grouped dequantisation reference path so correctness mode completes in practice on MACA

Solution

The examples now make target-aware choices at the example layer. Where CUDA-only APIs or Hopper-tuned shapes were previously assumed, the MACA path now chooses a compatible emitter, a smaller launch configuration, or a portable reference implementation. This keeps the examples executable without changing their intended numerical checks.

Alternatives considered

A narrower alternative was to skip the affected examples on MACA and retain the existing CUDA-oriented logic. That would have preserved the xfails rather than solving them. Another possibility was to push all of the adaptation down into backend lowering. That would not help with example-side reference code that depends on APIs absent from the MACA runtime.

Verification

  • python -m pytest -q examples/maca/gemm_fp8/test_example_gemm_fp8.py
  • python -m pytest -q examples/maca/deepseek_deepgemm/test_example_deepgemm_fp8_2xAcc.py
  • python -m pytest -q examples/maca/dequantize_gemm/test_example_dequantize_gemm.py

Stack context

This PR builds on #33.

@github-actions

github-actions Bot commented Apr 9, 2026

Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@VitalyAnkh VitalyAnkh changed the title [MetaxGPU][Examples] Port low-precision MACA examples Make low-precision MACA examples use portable emitters, launch shapes, and reference paths Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Low-precision MACA examples assume CUDA/Hopper-specific emitters, launch shapes, and reference paths

1 participant