[feat] Add Hexagon HMX backend support by Calaweh · Pull Request #2155 · tile-ai/tilelang

Calaweh · 2026-05-06T10:32:03Z

Title: Add Hexagon Backend with HMX Support for Matrix Multiplication

Summary

Introduce a dedicated Hexagon backend for TileLang with support for Qualcomm HMX (Hexagon Matrix eXtensions). This integration enables the generation of high-performance LLVM IR specifically targeting Hexagon DSP hardware instructions for matrix multiplication.

Why

Qualcomm's Hexagon DSPs power a vast number of edge devices. By leveraging the Hexagon Kernel Library (HexKL) interfaces through TileLang, we can achieve efficient CodeGen for specialized operators like sparse attention on mobile hardware. This keeps TileLang at the forefront of heterogeneous hardware support and enables localized, backend-specific optimizations for Qualcomm chips.

Key Changes

Hardware Lowering

New C++ pass LowerHexagonIntrinsics that translates TileLang MMA placeholders into the specialized @HexKL_mma_i8acc32 hardware instructions.

Memory Architecture

Native C++ registration of global.vtcm and global.hmx.acc memory scopes.
Provides the compiler with correct alignment (128‑byte) and capacity constraints for Hexagon HTP.

Runtime Support

Runtime power‑management wrapper hmx_kernel_launch utilizing the HexagonHtp RAII guard to enable the HMX hardware block during execution.

JIT Infrastructure

Enhancements to the kernel cache to support IR‑only backends.
Addition of execution guards to prevent cross‑compiled binaries from attempting to run on incompatible host CPUs.

Validation

# Style and Linting
pre-commit run --all-files

# Build Verification (requires LLVM 17/18)
cd build
cmake .. -DUSE_LLVM=ON
make -j$(nproc)

# Symbol Verification
nm -D lib/libtilelang.so | grep "LowerHexagonIntrinsics"

# Functional Testing (IR Generation & Logic)
pytest testing/python/hexagon/test_hmx_mma.py
pytest testing/python/hexagon/diagnose_hmx.py

Additional Checks

IR Validation: Verified correct target triple (hexagon), memory allocations (A_vtcm, C_acc), and hardware intrinsic calls (HexKL_mma_i8acc32).
Host Guard: Verified that calling Hexagon kernels on x86/ARM hosts raises a descriptive RuntimeError.

Compatibility

Requires LLVM 17 or 18.
Tested on Qualcomm HTP hardware (simulator and physical device).

Related Issues

Closes: #1293

Summary by CodeRabbit

New Features
- Hexagon DSP target support with HMX acceleration, intrinsics (MMA, DMA), architecture descriptors, and runtime kernel launch registration.
- Target detection and tooling paths for Hexagon to enable cross-compilation flows.
Tests
- New diagnostics and tests for Hexagon codegen, IR lowering, and HMX MMA behavior.
Chores
- Build system and kernel-cache updates to accommodate Hexagon/LLVM/IR-only backends.

…lowering pipeline

github-actions · 2026-05-06T10:32:14Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-05-06T10:32:17Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 307e835c-4a40-4b9e-b417-a115a3c918fd

📥 Commits

Reviewing files that changed from the base of the PR and between fd34124 and 1a9aae6.

📒 Files selected for processing (1)

testing/python/hexagon/diagnose_hmx.py

📝 Walkthrough

Walkthrough

This PR adds Hexagon HMX support: build and runtime wiring, Hexagon intrinsics and lowering pass, architecture descriptors and target detection, JIT/libgen adapters for IR-only cross-compilation, kernel caching updates, and tests/diagnostics for Hexagon codegen and lowering.

Changes

Hexagon Backend with HMX Support

Layer / File(s)	Summary
Build / Includes `CMakeLists.txt`	Adds LLVM/Hexagon detection flow, caches/restores `USE_LLVM` around TVM config, appends `TVM_SOURCE/src` to include paths, and adds `src/runtime/hexagon_runtime.cc` to sources.
Runtime FFI `src/runtime/hexagon_runtime.cc`	Registers `tilelang.hexagon.hmx_kernel_launch` packed function (guarded by `TILELANG_HEXAGON_ENABLED`) that extracts a kernel Function, powers HMX via HexagonHtp RAII, and forwards args to the kernel.
Lowering Pass & Memory Info `src/transform/lower_hexagon_intrinsics.cc`, `tilelang/engine/phase.py`	Adds `LowerHexagonIntrinsics` pass that rewrites `hmx_mma_placeholder` externs to HexKL form and registers HMX accumulator/VTCM MemoryInfo; wires the pass into `OptimizeForTarget`.
Engine Lowering Flow `tilelang/engine/lower.py`	Detects Hexagon early in `lower()`, skips certain passes for Hexagon, applies Hexagon-specific Phase 1/opt steps, and adds `_lower_hexagon_intrinsics` helper to invoke the lowering hook.
Intrinsics API `tilelang/intrinsics/hexagon/__init__.py`	Adds `mma`, `mma_fp16`, and `vtcm_dma_copy` intrinsics, provides `HMXBuilder` and `hmx` instance, and emits corresponding extern calls.
Architecture & Target Utilities `tilelang/carver/arch/hexagon.py`, `tilelang/carver/arch/__init__.py`, `tilelang/utils/target.py`	Introduces `HexagonArch`, memory scopes, `HMXTileShape` and `HMX_TILE_SHAPES`, `get_hexagon_arch` factory, adds `hexagon` to supported targets, `check_hexagon_availability()`, and `is_hexagon_target()`.
JIT / Libgen / Adapter Integration `tilelang/jit/adapter/*`, `tilelang/jit/adapter/wrapper.py`, `tilelang/jit/kernel.py`, `tilelang/jit/adapter/utils.py`	Adds Hexagon-aware guards to skip host library loading/compilation for cross-compilation (IR-only), maps Hexagon buffers to CPU device for parsing, exposes `arch` property on `TLWrapper`, relaxes device-module assertions for Hexagon, and extends kernel-declare matching for LLVM-style declarations.
Kernel Cache `tilelang/cache/kernel_cache.py`	Handles IR-only targets by excluding binary `.so` from required files and skipping binary cache save when adapter.libpath is `None`.
Tests & Diagnostics `testing/python/hexagon/diagnose_hmx.py`, `testing/python/hexagon/test_hmx_mma.py`	Adds `has_hexagon_codegen()`, `build_hmx_matmul()` kernel factory, diagnostic tests (`test_000_environment`, `test_001_ir_dump`, `test_002_hmx_lowering_status`) and compilation/lowering checks plus host-execution guard tests.

Sequence Diagram

sequenceDiagram
    participant User
    participant Compiler
    participant Lowering
    participant HexIntrinsics
    participant JIT
    participant FFI
    participant HMX

    User->>Compiler: Compile kernel for Hexagon target
    Compiler->>Compiler: Detect is_hexagon_target()
    Compiler->>Lowering: lower() (Hexagon branch)
    Lowering->>Lowering: Skip incompatible passes
    Lowering->>HexIntrinsics: Invoke LowerHexagonIntrinsics
    HexIntrinsics-->>Lowering: Return lowered IR (HexKL intrinsics)
    Lowering-->>JIT: Emit IR / compile (IR-only path)
    JIT-->>User: Provide IR artifact (no host .so)
    User->>FFI: Call tilelang.hexagon.hmx_kernel_launch
    FFI->>FFI: Extract kernel Function and args
    FFI->>HMX: Enter HexagonHtp RAII (power on HMX)
    HMX->>HMX: Execute HMX MMA instructions
    HMX-->>FFI: Return results
    FFI-->>User: Deliver kernel output

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

LeiWang1999
Gongen-Ali

Poem

🐰 I hopped to Hexagon's bright glen,
HMX tiles and VTCM then,
Intrinsics lowered, kernels spun,
Cross-compile paths now neatly done,
A carrot-powered kernel run!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 43.10% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[feat] Add Hexagon HMX backend support' directly and clearly summarizes the main change: introducing Hexagon HMX backend support to TileLang.
Linked Issues check	✅ Passed	The PR fully addresses issue `#1293` by implementing Hexagon backend with HMX support, including hardware lowering, memory architecture, runtime wrapper, and kernel cache enhancements for cross-compiled Hexagon code.
Out of Scope Changes check	✅ Passed	All changes are directly in-scope for Hexagon HMX backend support: architecture definitions, intrinsics, lowering passes, JIT adapter updates, cache handling, and diagnostic tests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 14

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CMakeLists.txt`:
- Around line 157-186: The build currently defines TILELANG_HEXAGON_ENABLED
whenever USE_LLVM is truthy; update the CMake logic (around the llvm-config
discovery that sets LLVM_CONFIG_PATH/USE_LLVM and the later block that sets
TILELANG_HEXAGON_ENABLED) to actually verify the LLVM toolchain has Hexagon:
after finding llvm-config (LLVM_CONFIG_PATH) run llvm-config --targets-built (or
invoke it via execute_process) and check the output contains "Hexagon"; only
then set the TILELANG_HEXAGON_ENABLED definition and the "TileLang Build with
LLVM" status message. If llvm-config is not found or it does not report Hexagon,
change the earlier non-fatal warning into a fatal error (or at minimum do not
enable TILELANG_HEXAGON_ENABLED) so hexagon_runtime.cc and related transforms
are not advertised/compiled incorrectly; refer to the LLVM_CONFIG_PATH/USE_LLVM
and TILELANG_HEXAGON_ENABLED symbols to locate where to add the execute_process
and conditional enablement.

In `@src/runtime/hexagon_runtime.cc`:
- Around line 16-23: Ensure you validate args.size() before accessing args[0] or
constructing ffi::PackedArgs to avoid UB when called with zero arguments: in the
lambda that reads args and creates kernel and kernel_args, first check that
args.size() >= 1, and if not set an appropriate runtime error/return (e.g.,
populate rv with an error or throw a tvm/ffi runtime error) instead of
proceeding; specifically modify the block that uses args[0], ffi::Function
kernel, and ffi::PackedArgs kernel_args(args.data() + 1, args.size() - 1) to
perform the size check and early error return.

In `@src/transform/lower_hexagon_intrinsics.cc`:
- Around line 28-35: The rewrite for the "hmx_mma_placeholder" intrinsic
currently indexes call->args[3] without checking arity; update the branch that
handles func_name->value == "hmx_mma_placeholder" in lower_hexagon_intrinsics.cc
to first validate call->args.size() >= 4 and only perform the Array<PrimExpr>
construction and return Evaluate(...) when that check passes; if the check
fails, preserve and return the original Call node (or emit a targeted
diagnostic/log) instead of indexing out of bounds to avoid hard runtime/compile
errors.

In `@testing/python/hexagon/diagnose_hmx.py`:
- Around line 139-141: The failure message in
testing/python/hexagon/diagnose_hmx.py incorrectly references the pass name
LowerHMXIntrinsics; change the string literal that currently says "Check
LowerHMXIntrinsics implementation." to reference the correct backend pass name
"LowerHexagonIntrinsics" so triage points to the right implementation (update
the message around the HMX placeholder/no intrinsic error text in
diagnose_hmx.py).
- Around line 9-15: The try/except in diagnose_hmx.py currently swallows all
exceptions when probing LLVM/Hexagon target (the block that calls
tvm.runtime.enabled("llvm") and tvm.target.Target(...)); replace the broad
except Exception with a narrow catch for the expected TVM probe error (e.g.,
tvm.error.TVMError or the TVM-specific probe exception available in your TVM
version, falling back to RuntimeError only if necessary) and re-raise any other
exceptions so real failures are not silenced; apply the same change to the other
probe sites referenced (the blocks around lines with the gated tests at the
other two probe locations) and ensure error handling logs or returns False only
for the known probe failure type while allowing unexpected exceptions to
propagate.

In `@tilelang/cache/kernel_cache.py`:
- Around line 457-460: The save path allows IR-only entries by checking
_is_ir_only (via kernel.adapter.libpath is None) and omitting kernel_lib_path
from missing_files, but load-time required files (method _get_required_files)
still unconditionally demands kernel_lib_path; update _get_required_files to
detect the same IR-only condition (check kernel.adapter.libpath or reuse the
_is_ir_only logic) and exclude self.kernel_lib_path from the required list when
IR-only so save and load semantics match; ensure the change references
_is_ir_only, kernel.adapter.libpath, _get_missing_complete_cache_files,
_get_required_files, and kernel_lib_path so both save and load use the same
criterion.

In `@tilelang/carver/arch/__init__.py`:
- Line 52: The symbol is_hexagon_target was added to __all__ but never defined
or imported in this module, so imports fail; fix by importing or defining
is_hexagon_target in this module and ensuring it is present in the module
globals before exporting (e.g., add a proper import statement that brings
is_hexagon_target into tilelang.carver.arch or define the function here), then
keep it listed in __all__ so from tilelang.carver.arch import is_hexagon_target
and star-imports succeed.

In `@tilelang/intrinsics/hexagon/__init__.py`:
- Around line 94-101: The export list __all__ references
register_hexagon_memory_info which is not defined or imported, causing
import-time failures; fix by either importing the symbol into this module (e.g.,
add an import that provides register_hexagon_memory_info) or remove it from the
__all__ list so only existing names (hmx, HMXBuilder, mma, mma_fp16,
vtcm_dma_copy) are exported; update the module’s top-level imports or the
__all__ array accordingly and ensure any chosen import refers to the correct
source that defines register_hexagon_memory_info.
- Around line 53-67: The vtcm_dma_copy function currently constructs a
tir.Evaluate node but never emits it into the active TIR builder; change it to
call T.evaluate(...) (like the mma() helper does) so the extern DMA call is
actually inserted into the TIR; ensure tilelang.language is imported as T and
replace the tir.Evaluate(...) invocation inside vtcm_dma_copy with
T.evaluate(...) while keeping the tir.call_extern arguments (hexagon_dma_copy,
src.access_ptr("r"), dst.access_ptr("w")) unchanged.

In `@tilelang/jit/adapter/cython/adapter.py`:
- Around line 134-142: The Hexagon guard (is_hexagon_target(self.target)) runs
after the code that compiles/loads the generated library; move this check to run
before any compile/load steps so cross-compilation never attempts to load
host-incompatible artifacts—i.e., in the method containing the compile/load
logic, call is_hexagon_target(self.target) at the very start, return early and
set self._compiled_func = None (leaving kernel.kernel_source etc. intact) so the
JIT object can be inspected without performing host library loading.

In `@tilelang/jit/adapter/utils.py`:
- Around line 88-105: match_declare_kernel_cpu currently ignores the requested
symbol and always returns the first int32_t/define; detect when the caller
passed a function name (create_call_func passes "function_name(" into the
annotation param), extract the function name (e.g., annotation.split("(")[0] if
"(" present), then build the C and LLVM search patterns to match that exact
function (C: use a regex like r"\b<int_return>?\s+{re.escape(func_name)}\b" or
simply search for the function name with a preceding return/type token, LLVM:
r"define\s+.*@{re.escape(func_name)}\b"), use re.search on each line and return
source.index(match.group(0)) for the found match; if annotation is not a
function name keep the existing behavior but make patterns use word boundaries
so you don't pick substrings. Ensure you update match_declare_kernel_cpu to
reference the extracted func_name and to return the correct match start.

In `@tilelang/jit/adapter/wrapper.py`:
- Around line 988-992: The Hexagon target is being routed to TLCPUSourceWrapper
which expects C prototypes, but Hexagon emits LLVM IR; update the target routing
so Hexagon is parsed as IR instead of C: change the branch that checks
is_hexagon_target(self.target) (in the wrapper selection near
TLCPUSourceWrapper) to route Hexagon to an IR-aware wrapper (e.g., a new or
existing TLCPUIRWrapper) or enhance TLCPUSourceWrapper to detect LLVM IR and
parse declarations accordingly; also update create_call_func() so when handling
IR it extracts the function signature by splitting/locating the opening brace
'{' (or using an IR-specific parser) rather than split(";")[0], ensuring
argument extraction reads parameters not local IR instructions.

In `@tilelang/jit/kernel.py`:
- Around line 206-216: The current call path may attempt to call
self.torch_function when it is None; update the kernel invocation in the method
containing self.torch_function to raise a clear RuntimeError for uninitialized
kernels on all targets: keep the existing Hexagon-specific RuntimeError (using
is_hexagon_target(self.target)) and then add a generic RuntimeError if
self.torch_function is still None that explains the kernel is uninitialized and
cannot be executed on the host (mentioning to use the Hexagon SDK
Simulator/HexagonLauncher only in the Hexagon message). This ensures callers
receive a stable, descriptive error instead of a 'NoneType' object is not
callable' when invoking self.torch_function(*args, **kwds).

In `@tilelang/utils/target.py`:
- Around line 189-191: The normalization currently overwrites any Hexagon target
string (variable return_var) with a fixed "llvm -mtriple=hexagon
-mcpu=hexagonv73"; instead, update the logic in tilelang.utils.target (the block
handling Hexagon shorthands) to parse the existing return_var string and only
inject missing flags: if "-mtriple" is absent append "-mtriple=hexagon", if
"-mcpu" is absent append a default "-mcpu=hexagonv73", and preserve any existing
"-mcpu", "-mattr", or other flags already present in return_var; ensure you
handle cases where return_var is exactly "hexagon" (replace with "llvm" plus the
necessary flags) versus when it already starts with "llvm" (modify by appending
missing flags) so no explicit user flags are overwritten.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 05bd7958-0e49-4313-b7e6-8667f85194cb

📥 Commits

Reviewing files that changed from the base of the PR and between 48998d2 and fd34124.

📒 Files selected for processing (18)

CMakeLists.txt
src/runtime/hexagon_runtime.cc
src/transform/lower_hexagon_intrinsics.cc
testing/python/hexagon/diagnose_hmx.py
testing/python/hexagon/test_hmx_mma.py
tilelang/cache/kernel_cache.py
tilelang/carver/arch/__init__.py
tilelang/carver/arch/hexagon.py
tilelang/engine/lower.py
tilelang/engine/phase.py
tilelang/intrinsics/hexagon/__init__.py
tilelang/jit/adapter/base.py
tilelang/jit/adapter/cython/adapter.py
tilelang/jit/adapter/libgen.py
tilelang/jit/adapter/utils.py
tilelang/jit/adapter/wrapper.py
tilelang/jit/kernel.py
tilelang/utils/target.py

coderabbitai · 2026-05-06T10:42:17Z

+if(USE_LLVM STREQUAL "ON")
+  find_program(LLVM_CONFIG_PATH NAMES llvm-config-18 llvm-config-17 llvm-config)
+  if(LLVM_CONFIG_PATH)
+    set(USE_LLVM ${LLVM_CONFIG_PATH} CACHE STRING "Path to llvm-config" FORCE)
+  else()
+    message(WARNING "USE_LLVM is ON but llvm-config was not found in PATH.")
+  endif()
+endif()
+
+set(_TILELANG_USE_LLVM_SAVED ${USE_LLVM})
+
+# Load TVM sources and base configs
 include(cmake/load_tvm.cmake)

 if(EXISTS ${TVM_SOURCE}/cmake/config.cmake)
  include(${TVM_SOURCE}/cmake/config.cmake)
 else()
  message(FATAL_ERROR "Nor tvm provided or submodule checkout-ed.")
 endif()
+
+# Restore USE_LLVM — TVM's config.cmake may have reset it
+if(_TILELANG_USE_LLVM_SAVED AND NOT _TILELANG_USE_LLVM_SAVED STREQUAL "OFF")
+  set(USE_LLVM ${_TILELANG_USE_LLVM_SAVED} CACHE STRING "LLVM config path" FORCE)
+  message(STATUS "TileLang: restored USE_LLVM=${USE_LLVM}")
+endif()
+unset(_TILELANG_USE_LLVM_SAVED)
+
+if(USE_LLVM AND NOT USE_LLVM STREQUAL "OFF")
+  message(STATUS "TileLang Build with LLVM: ${USE_LLVM}")
+  add_definitions(-DTILELANG_HEXAGON_ENABLED)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail for bin in llvm-config-18 llvm-config-17 llvm-config; do if command -v "$bin" >/dev/null 2>&1; then "$bin" --version "$bin" --targets-built exit 0 fi done echo "llvm-config not found in PATH" >&2 exit 1

Repository: tile-ai/tilelang

Length of output: 90

🏁 Script executed:

# First, locate and examine the CMakeLists.txt file find . -name "CMakeLists.txt" -type f | head -20

Repository: tile-ai/tilelang

Length of output: 180

🏁 Script executed:

# Once found, examine the relevant lines (157-186) sed -n '157,186p' CMakeLists.txt

Repository: tile-ai/tilelang

Length of output: 1124

🏁 Script executed:

# Search for TILELANG_HEXAGON_ENABLED usage in the codebase rg -r "TILELANG_HEXAGON_ENABLED" --type cmake --type cpp --type h

Repository: tile-ai/tilelang

Length of output: 457

🏁 Script executed:

# Check for any other LLVM/Hexagon related configurations rg -i "hexagon" --type cmake --type cpp --type h -B 2 -A 2

Repository: tile-ai/tilelang

Length of output: 5477

🏁 Script executed:

# Look for any cmake configuration files related to LLVM setup find . -path "*/cmake/*.cmake" -type f | head -20

Repository: tile-ai/tilelang

Length of output: 145

Hexagon support is enabled without verifying LLVM has the Hexagon backend.

When USE_LLVM is set to any truthy value other than "OFF", TILELANG_HEXAGON_ENABLED is unconditionally defined. This happens even if:

llvm-config was not found (only a warning is issued on line 162)

The discovered LLVM lacks the Hexagon backend

Since hexagon_runtime.cc and Hexagon-specific transforms depend on this definition, the build can advertise Hexagon support and fail later in less obvious places.

Query llvm-config --targets-built after locating it, and only enable Hexagon support if the output contains Hexagon. Additionally, consider making the missing llvm-config a fatal error since Hexagon requires it.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CMakeLists.txt` around lines 157 - 186, The build currently defines TILELANG_HEXAGON_ENABLED whenever USE_LLVM is truthy; update the CMake logic (around the llvm-config discovery that sets LLVM_CONFIG_PATH/USE_LLVM and the later block that sets TILELANG_HEXAGON_ENABLED) to actually verify the LLVM toolchain has Hexagon: after finding llvm-config (LLVM_CONFIG_PATH) run llvm-config --targets-built (or invoke it via execute_process) and check the output contains "Hexagon"; only then set the TILELANG_HEXAGON_ENABLED definition and the "TileLang Build with LLVM" status message. If llvm-config is not found or it does not report Hexagon, change the earlier non-fatal warning into a fatal error (or at minimum do not enable TILELANG_HEXAGON_ENABLED) so hexagon_runtime.cc and related transforms are not advertised/compiled incorrectly; refer to the LLVM_CONFIG_PATH/USE_LLVM and TILELANG_HEXAGON_ENABLED symbols to locate where to add the execute_process and conditional enablement.

coderabbitai · 2026-05-06T10:42:17Z

+      [](ffi::PackedArgs args, ffi::Any *rv) {
+        // args[0] is the kernel Function; remaining args are forwarded to it.
+        // AnyView supports .cast<T>() for type-safe extraction.
+        ffi::Function kernel = args[0].cast<ffi::Function>();
+
+        // PackedArgs(const AnyView* data, int32_t size) — slice past the first
+        // arg. args.data() returns const AnyView*, args.size() returns int32_t.
+        ffi::PackedArgs kernel_args(args.data() + 1, args.size() - 1);


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate args.size() before reading args[0] and slicing the tail.

This packed function is globally callable from Python/C++. An empty call will access args[0] and build PackedArgs(..., -1) before the HMX guard runs, so the failure mode is much harsher than a normal runtime error.

Proposed fix

"tilelang.hexagon.hmx_kernel_launch", [](ffi::PackedArgs args, ffi::Any *rv) { + ICHECK_GE(args.size(), 1) + << "tilelang.hexagon.hmx_kernel_launch expects a kernel function as arg0"; // args[0] is the kernel Function; remaining args are forwarded to it. // AnyView supports .cast<T>() for type-safe extraction. ffi::Function kernel = args[0].cast<ffi::Function>();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

[](ffi::PackedArgs args, ffi::Any *rv) {

// args[0] is the kernel Function; remaining args are forwarded to it.

// AnyView supports .cast<T>() for type-safe extraction.

ffi::Function kernel = args[0].cast<ffi::Function>();

// PackedArgs(const AnyView* data, int32_t size) — slice past the first

// arg. args.data() returns const AnyView*, args.size() returns int32_t.

ffi::PackedArgs kernel_args(args.data() + 1, args.size() - 1);

[](ffi::PackedArgs args, ffi::Any *rv) {

ICHECK_GE(args.size(), 1)

<< "tilelang.hexagon.hmx_kernel_launch expects a kernel function as arg0";

// args[0] is the kernel Function; remaining args are forwarded to it.

// AnyView supports .cast<T>() for type-safe extraction.

ffi::Function kernel = args[0].cast<ffi::Function>();

// PackedArgs(const AnyView* data, int32_t size) — slice past the first

// arg. args.data() returns const AnyView*, args.size() returns int32_t.

ffi::PackedArgs kernel_args(args.data() + 1, args.size() - 1);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/runtime/hexagon_runtime.cc` around lines 16 - 23, Ensure you validate args.size() before accessing args[0] or constructing ffi::PackedArgs to avoid UB when called with zero arguments: in the lambda that reads args and creates kernel and kernel_args, first check that args.size() >= 1, and if not set an appropriate runtime error/return (e.g., populate rv with an error or throw a tvm/ffi runtime error) instead of proceeding; specifically modify the block that uses args[0], ffi::Function kernel, and ffi::PackedArgs kernel_args(args.data() + 1, args.size() - 1) to perform the size check and early error return.

coderabbitai · 2026-05-06T10:42:17Z

+          if (func_name->value == "hmx_mma_placeholder") {
+            Array<PrimExpr> new_args;
+            new_args.push_back(StringImm("HexKL_mma_i8acc32"));
+            new_args.push_back(
+                call->args[3]); // C_acc (accumulator — first arg to HexKL)
+            new_args.push_back(call->args[1]); // A_vtcm
+            new_args.push_back(call->args[2]); // B_vtcm
+            return Evaluate(


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard the placeholder arity before indexing call->args[3].

This pass is globally callable on arbitrary TIR, so a malformed call_extern("hmx_mma_placeholder", ...) with fewer than four arguments will trip TVM's bounds checks and fail the compile with a hard error. Validate the expected operand count before rewriting, then either keep the node unchanged or emit a targeted diagnostic.

Proposed fix

// Lower HMX MMA placeholder if (func_name->value == "hmx_mma_placeholder") { + ICHECK_EQ(call->args.size(), 4) + << "hmx_mma_placeholder expects exactly 3 operands"; Array<PrimExpr> new_args; new_args.push_back(StringImm("HexKL_mma_i8acc32")); new_args.push_back( call->args[3]); // C_acc (accumulator — first arg to HexKL)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (func_name->value == "hmx_mma_placeholder") {

Array<PrimExpr> new_args;

new_args.push_back(StringImm("HexKL_mma_i8acc32"));

new_args.push_back(

call->args[3]); // C_acc (accumulator — first arg to HexKL)

new_args.push_back(call->args[1]); // A_vtcm

new_args.push_back(call->args[2]); // B_vtcm

return Evaluate(

if (func_name->value == "hmx_mma_placeholder") {

ICHECK_EQ(call->args.size(), 4)

<< "hmx_mma_placeholder expects exactly 3 operands";

Array<PrimExpr> new_args;

new_args.push_back(StringImm("HexKL_mma_i8acc32"));

new_args.push_back(

call->args[3]); // C_acc (accumulator — first arg to HexKL)

new_args.push_back(call->args[1]); // A_vtcm

new_args.push_back(call->args[2]); // B_vtcm

return Evaluate(

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/transform/lower_hexagon_intrinsics.cc` around lines 28 - 35, The rewrite for the "hmx_mma_placeholder" intrinsic currently indexes call->args[3] without checking arity; update the branch that handles func_name->value == "hmx_mma_placeholder" in lower_hexagon_intrinsics.cc to first validate call->args.size() >= 4 and only perform the Array<PrimExpr> construction and return Evaluate(...) when that check passes; if the check fails, preserve and return the original Call node (or emit a targeted diagnostic/log) instead of indexing out of bounds to avoid hard runtime/compile errors.

coderabbitai · 2026-05-06T10:42:17Z

+    try:
+        if not tvm.runtime.enabled("llvm"):
+            return False
+        tvm.target.Target("llvm -mtriple=hexagon -mcpu=hexagonv73")
+        return True
+    except Exception:
+        return False


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cat -n testing/python/hexagon/diagnose_hmx.py | head -120

Repository: tile-ai/tilelang

Length of output: 5134

🏁 Script executed:

cat -n testing/python/hexagon/diagnose_hmx.py | sed -n '110,150p'

Repository: tile-ai/tilelang

Length of output: 1775

🏁 Script executed:

# Search for LowerHMXIntrinsics and LowerHexagonIntrinsics in the codebase rg "LowerHMXIntrinsics|LowerHexagonIntrinsics" -i

Repository: tile-ai/tilelang

Length of output: 793

🏁 Script executed:

# Check the lower.py file to see what exceptions might be expected cat -n tilelang/engine/lower.py | head -60

Repository: tile-ai/tilelang

Length of output: 2467

🏁 Script executed:

# Check what exceptions TVM Target raises by looking for its usage and error handling rg "tvm.target.Target" -A 3 -B 1 | head -80

Repository: tile-ai/tilelang

Length of output: 5048

🏁 Script executed:

# Look for TVM exception types and how Target errors are typically caught rg "except.*Error|except.*Exception" testing/python/hexagon/diagnose_hmx.py -B 3

Repository: tile-ai/tilelang

Length of output: 358

🏁 Script executed:

# Check if there are TVM-specific exception types in the codebase rg "from tvm|import.*tvm" testing/python/hexagon/diagnose_hmx.py | head -10

Repository: tile-ai/tilelang

Length of output: 92

🏁 Script executed:

# Check TVM exception handling patterns in the codebase rg "tvm.error|TVMError|from tvm.*import|except.*as e" -A 2 | head -100

Repository: tile-ai/tilelang

Length of output: 5401

🏁 Script executed:

# Check the diagnostic test more carefully - what's the context around line 141? cat -n testing/python/hexagon/diagnose_hmx.py | sed -n '135,145p'

Repository: tile-ai/tilelang

Length of output: 561

🏁 Script executed:

# Verify the actual pass name references rg "LowerHMXIntrinsics|LowerHexagonIntrinsics" -B 2 -A 2

Repository: tile-ai/tilelang

Length of output: 2746

🏁 Script executed:

# Check if there are similar skip-gate functions elsewhere in the codebase with better exception handling rg "pytest.mark.skipif" -B 2 -A 2 | head -60

Repository: tile-ai/tilelang

Length of output: 4792

Narrow the skip-gate exception handling to prevent silent skips.

Line 14 catches every exception and returns False, which can hide real regressions and silently skip both gated tests (Lines 64 and 110). Catch only expected probe failures and re-raise unexpected exceptions.

Suggested fix

def has_hexagon_codegen(): try: if not tvm.runtime.enabled("llvm"): return False tvm.target.Target("llvm -mtriple=hexagon -mcpu=hexagonv73") return True - except Exception: - return False + except Exception as err: + msg = str(err).lower() + # Expected probe failures: missing Hexagon/LLVM target support. + if "hexagon" in msg or "llvm" in msg or "target" in msg: + return False + raise

Please verify and, if available in your TVM version, prefer a concrete TVM exception type (e.g., TVM-specific error class) over message matching.

Also applies to: 64-65, 110-111

🧰 Tools

🪛 Ruff (0.15.12)

[warning] 14-14: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@testing/python/hexagon/diagnose_hmx.py` around lines 9 - 15, The try/except in diagnose_hmx.py currently swallows all exceptions when probing LLVM/Hexagon target (the block that calls tvm.runtime.enabled("llvm") and tvm.target.Target(...)); replace the broad except Exception with a narrow catch for the expected TVM probe error (e.g., tvm.error.TVMError or the TVM-specific probe exception available in your TVM version, falling back to RuntimeError only if necessary) and re-raise any other exceptions so real failures are not silenced; apply the same change to the other probe sites referenced (the blocks around lines with the gated tests at the other two probe locations) and ensure error handling logs or returns False only for the known probe failure type while allowing unexpected exceptions to propagate.

coderabbitai · 2026-05-06T10:42:18Z

+        from tilelang.utils.target import is_hexagon_target
+
+        if is_hexagon_target(self.target):
+            # For Hexagon, we are cross-compiling.
+            # We cannot load symbols or execute this on the host machine.
+            # Returning early allows the JIT object to exist so we can
+            # inspect kernel.kernel_source.
+            self._compiled_func = None
+            return


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return before compiling/loading Hexagon artifacts.

By the time this guard runs, Lines 130-132 have already compiled and loaded the generated library. That's the host-incompatible step this branch is supposed to skip for cross-compiled Hexagon kernels.

Suggested fix

self.wrapper.assign_host_module(host_mod) self.wrapper.assign_device_module(device_mod) self.host_kernel_source = self.wrapper.wrap(self.get_kernel_source(kernel_only=True)) + from tilelang.utils.target import is_hexagon_target + + if is_hexagon_target(self.target): + # For Hexagon, we are cross-compiling. + # We cannot load symbols or execute this on the host machine. + # Returning early allows the JIT object to exist so we can + # inspect kernel.kernel_source. + self._compiled_func = None + return + self.lib_generator.update_lib_code(self.host_kernel_source) self.lib_generator.compile_lib() self.lib = self.lib_generator.load_lib() - - from tilelang.utils.target import is_hexagon_target - - if is_hexagon_target(self.target): - # For Hexagon, we are cross-compiling. - # We cannot load symbols or execute this on the host machine. - # Returning early allows the JIT object to exist so we can - # inspect kernel.kernel_source. - self._compiled_func = None - return

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tilelang/jit/adapter/cython/adapter.py` around lines 134 - 142, The Hexagon guard (is_hexagon_target(self.target)) runs after the code that compiles/loads the generated library; move this check to run before any compile/load steps so cross-compilation never attempts to load host-incompatible artifacts—i.e., in the method containing the compile/load logic, call is_hexagon_target(self.target) at the very start, return early and set self._compiled_func = None (leaving kernel.kernel_source etc. intact) so the JIT object can be inspected without performing host library loading.

coderabbitai · 2026-05-06T10:42:18Z

 def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int:
-    pattern = r"int32_t\s+\w+"
+    # C-style signature
+    pattern_c = r"int32_t\s+\w+"
+    # LLVM-style signature
+    pattern_llvm = r"define\s+.*@(?!llvm\.)\w+"
+
    for line in source.split("\n"):
-        if annotation in line:
-            matched = re.findall(pattern, line)
-            if len(matched) >= 1:
-                return source.index(matched[0] + "(")
+        # C pattern
+        matched = re.findall(pattern_c, line)
+        if matched:
+            return source.index(matched[0])
+        # LLVM pattern
+        matched = re.findall(pattern_llvm, line)
+        if matched:
+            # Match the start of the 'define'
+            return source.index(matched[0])
+
    raise ValueError("No global kernel found in the source code")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Match the requested symbol instead of the first CPU declaration.

annotation is ignored here now, so every lookup returns the first int32_t/define in the file. TLCPUSourceWrapper.create_call_func() passes function_name + "(" into this helper, so multi-function CPU/LLVM sources will bind later wrappers to the wrong declaration.

Suggested fix

def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int: - # C-style signature - pattern_c = r"int32_t\s+\w+" - # LLVM-style signature - pattern_llvm = r"define\s+.*@(?!llvm\.)\w+" + func_name = annotation[:-1] if annotation.endswith("(") else None + if func_name and func_name != "int32_t": + pattern_c = rf"\bint32_t\s+{re.escape(func_name)}\b" + pattern_llvm = rf"\bdefine\b[^\n@]*@{re.escape(func_name)}\b" + else: + pattern_c = r"\bint32_t\s+\w+\b" + pattern_llvm = r"\bdefine\b[^\n@]*@(?!llvm\.)\w+\b" for line in source.split("\n"): # C pattern matched = re.findall(pattern_c, line)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int:

pattern = r"int32_t\s+\w+"

# C-style signature

pattern_c = r"int32_t\s+\w+"

# LLVM-style signature

pattern_llvm = r"define\s+.*@(?!llvm\.)\w+"

for line in source.split("\n"):

if annotation in line:

matched = re.findall(pattern, line)

if len(matched) >= 1:

return source.index(matched[0] + "(")

# C pattern

matched = re.findall(pattern_c, line)

if matched:

return source.index(matched[0])

# LLVM pattern

matched = re.findall(pattern_llvm, line)

if matched:

# Match the start of the 'define'

return source.index(matched[0])

raise ValueError("No global kernel found in the source code")

def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int:

func_name = annotation[:-1] if annotation.endswith("(") else None

if func_name and func_name != "int32_t":

pattern_c = rf"\bint32_t\s+{re.escape(func_name)}\b"

pattern_llvm = rf"\bdefine\b[^\n@]*@{re.escape(func_name)}\b"

else:

pattern_c = r"\bint32_t\s+\w+\b"

pattern_llvm = r"\bdefine\b[^\n@]*@(?!llvm\.)\w+\b"

for line in source.split("\n"):

# C pattern

matched = re.findall(pattern_c, line)

if matched:

return source.index(matched[0])

# LLVM pattern

matched = re.findall(pattern_llvm, line)

if matched:

# Match the start of the 'define'

return source.index(matched[0])

raise ValueError("No global kernel found in the source code")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tilelang/jit/adapter/utils.py` around lines 88 - 105, match_declare_kernel_cpu currently ignores the requested symbol and always returns the first int32_t/define; detect when the caller passed a function name (create_call_func passes "function_name(" into the annotation param), extract the function name (e.g., annotation.split("(")[0] if "(" present), then build the C and LLVM search patterns to match that exact function (C: use a regex like r"\b<int_return>?\s+{re.escape(func_name)}\b" or simply search for the function name with a preceding return/type token, LLVM: r"define\s+.*@{re.escape(func_name)}\b"), use re.search on each line and return source.index(match.group(0)) for the found match; if annotation is not a function name keep the existing behavior but make patterns use word boundaries so you don't pick substrings. Ensure you update match_declare_kernel_cpu to reference the extracted func_name and to return the correct match start.

coderabbitai · 2026-05-06T10:42:18Z

+        elif is_cpu_target(self.target) or is_hexagon_target(self.target):
+            wrapper_class = TLCPUSourceWrapper
        else:
-            raise ValueError(f"Unsupported platform: {self.arch.platform}")
+            arch_name = self.arch.platform if self.arch else "unknown"
+            raise ValueError(f"Unsupported platform: {arch_name}")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the wrapper.py file and the specific lines mentioned find . -type f -name "wrapper.py" | grep -E "jit/adapter"

Repository: tile-ai/tilelang

Length of output: 176

🏁 Script executed:

# Get the file size and content around lines 988-992 wc -l tilelang/jit/adapter/wrapper.py sed -n '980,1000p' tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 1006

🏁 Script executed:

# Now let's find the create_call_func method in TLCPUSourceWrapper grep -n "create_call_func" tilelang/jit/adapter/wrapper.py | head -20

Repository: tile-ai/tilelang

Length of output: 200

🏁 Script executed:

# Search for TLCPUSourceWrapper class definition and create_call_func implementation grep -n "class TLCPUSourceWrapper" tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 90

🏁 Script executed:

# Search for match_declare_kernel_cpu in utils.py grep -n "match_declare_kernel_cpu" tilelang/jit/adapter/utils.py

Repository: tile-ai/tilelang

Length of output: 142

🏁 Script executed:

# Read the create_call_func method implementation sed -n '790,850p' tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 3011

🏁 Script executed:

# Read the match_declare_kernel_cpu function sed -n '88,120p' tilelang/jit/adapter/utils.py

Repository: tile-ai/tilelang

Length of output: 982

🏁 Script executed:

# Check what type of source code is passed to TLCPUSourceWrapper # Look at the context where wrapper_class is instantiated sed -n '975,1010p' tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 1442

🏁 Script executed:

# Search for Hexagon-related code generation and what source format it produces rg "hexagon" -i tilelang/jit/adapter/ -A 3 -B 3

Repository: tile-ai/tilelang

Length of output: 5418

🏁 Script executed:

# Look for where the source code is generated and what format it takes for Hexagon rg "is_hexagon_target" -i tilelang/jit/ -A 5 -B 5

Repository: tile-ai/tilelang

Length of output: 6891

🏁 Script executed:

# Check if there's any LLVM IR being passed for Hexagon grep -r "llvm" tilelang/jit/adapter/ -i

Repository: tile-ai/tilelang

Length of output: 327

🏁 Script executed:

# Let's trace back - look at what generates c_source that's passed to wrap() grep -n "def wrap" tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 181

🏁 Script executed:

# See where wrap() is called and what source is passed rg "\.wrap\(" tilelang/jit/ -B 5

Repository: tile-ai/tilelang

Length of output: 1749

🏁 Script executed:

# Check what get_kernel_source returns and what format Hexagon uses grep -n "get_kernel_source" tilelang/jit/adapter/ -r -A 3

Repository: tile-ai/tilelang

Length of output: 2051

🏁 Script executed:

# Look for Hexagon-specific code generation find tilelang -path "*hexagon*" -type f | head -20

Repository: tile-ai/tilelang

Length of output: 132

🏁 Script executed:

# Check what kind of source TVM Hexagon backend produces grep -n "llvm" tilelang/jit/adapter/wrapper.py -i

Repository: tile-ai/tilelang

Length of output: 42

🏁 Script executed:

# Look at what happens in the Cython adapter for get_kernel_source grep -n "def get_kernel_source" tilelang/jit/adapter/ -r -A 15

Repository: tile-ai/tilelang

Length of output: 5063

🏁 Script executed:

# Check what code format TVM's Hexagon target produces grep -n "hexagon" tilelang/jit/adapter/tvm_ffi.py -i -B 3 -A 3

Repository: tile-ai/tilelang

Length of output: 42

🏁 Script executed:

# Look at the cython adapter to understand the flow for Hexagon sed -n '100,135p' tilelang/jit/adapter/cython/adapter.py

Repository: tile-ai/tilelang

Length of output: 1623

🏁 Script executed:

# Check TVM target creation for Hexagon grep -r "Target\|target" tilelang/jit/adapter/cython/adapter.py | grep -i hexagon

Repository: tile-ai/tilelang

Length of output: 240

🏁 Script executed:

# Look for what TVM codegen produces - check if there's any indication of LLVM vs C grep -n "device_kernel_source" tilelang/jit/adapter/cython/adapter.py | head -10

Repository: tile-ai/tilelang

Length of output: 656

🏁 Script executed:

# Let's check if Hexagon can produce LLVM IR by examining TVM integration grep -n "ir_module\|IRModule" tilelang/jit/adapter/cython/adapter.py -B 2 -A 2 | head -40

Repository: tile-ai/tilelang

Length of output: 1633

🏁 Script executed:

# Check what format TVM produces for different targets # Look for codegen or backend specifications find tilelang -name "*.py" | xargs grep -l "hexagon" | head -5

Repository: tile-ai/tilelang

Length of output: 219

🏁 Script executed:

# Check the hexagon arch file to understand what source it processes sed -n '1,50p' tilelang/carver/arch/hexagon.py

Repository: tile-ai/tilelang

Length of output: 2022

🏁 Script executed:

# Look for test cases or examples that use Hexagon to see what source format is expected find . -path "*/test*" -name "*.py" | xargs grep -l "hexagon" 2>/dev/null | head -3

Repository: tile-ai/tilelang

Length of output: 142

🏁 Script executed:

# Check what code is generated by looking at a simple example or test # Look for inspect_source usage with Hexagon grep -r "inspect_source\|get_device_source" tilelang/jit/ -B 3 -A 3 | head -40

Repository: tile-ai/tilelang

Length of output: 2068

🏁 Script executed:

# The key insight: Let's check if there's any LLVM IR handling elsewhere in CPU wrapper grep -n "LLVM\|llvm\|define" tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 91

🏁 Script executed:

# Check test cases to see what format Hexagon source is expected to be sed -n '1,100p' ./testing/python/hexagon/test_hmx_mma.py

Repository: tile-ai/tilelang

Length of output: 3388

🏁 Script executed:

# Check what TVM produces for Hexagon - look at lower.py grep -n "hexagon\|llvm" tilelang/engine/lower.py -i -B 2 -A 2 | head -50

Repository: tile-ai/tilelang

Length of output: 1992

🏁 Script executed:

# The critical test: what does inspect_source() return for a Hexagon target? # Let's check if there's documentation or examples grep -r "inspect_source" tilelang/ --include="*.py" -B 1 -A 1 | grep -i "hexagon\|llvm" -B 2 -A 2

Repository: tile-ai/tilelang

Length of output: 42

🏁 Script executed:

# Let's check what TVM's Hexagon target backend is set to grep -n "hexagon\|llvm" tilelang/utils/target.py -i -B 2 -A 2

Repository: tile-ai/tilelang

Length of output: 2237

🏁 Script executed:

# Confirm that split(";") will fail on LLVM IR python3 << 'EOF' # Example LLVM function llvm_func = '''define i32 `@kernel`(i32* %arg0, i32 %arg1) { %1 = load i32, i32* %arg0 %2 = add i32 %1, %arg1 ret i32 %2 }''' # What happens with split(";")[0] declaration = llvm_func.split(";")[0] print("LLVM function:") print(llvm_func) print("\nResult of split(';')[0]:") print(declaration) print("\nThis includes the entire function body, not just the declaration!") # Example C function c_func = '''int32_t kernel(int32_t* arg0, int32_t arg1); { // body }''' declaration_c = c_func.split(";")[0] print("\n\nC function:") print(c_func) print("\nResult of split(';')[0]:") print(declaration_c) print("\nThis correctly stops at the declaration!") EOF

Repository: tile-ai/tilelang

Length of output: 577

TLCPUSourceWrapper parses C prototypes, not LLVM IR.

This branch routes Hexagon (which generates LLVM IR, not C code) through TLCPUSourceWrapper. However, create_call_func() still extracts declarations with split(";")[0], which fails for LLVM IR—LLVM uses braces, not semicolons. This causes the parser to include the entire function body in the "declaration", and argument extraction will then match locals instead of parameters, producing malformed function calls. Update the CPU wrapper to parse LLVM IR before routing Hexagon here.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tilelang/jit/adapter/wrapper.py` around lines 988 - 992, The Hexagon target is being routed to TLCPUSourceWrapper which expects C prototypes, but Hexagon emits LLVM IR; update the target routing so Hexagon is parsed as IR instead of C: change the branch that checks is_hexagon_target(self.target) (in the wrapper selection near TLCPUSourceWrapper) to route Hexagon to an IR-aware wrapper (e.g., a new or existing TLCPUIRWrapper) or enhance TLCPUSourceWrapper to detect LLVM IR and parse declarations accordingly; also update create_call_func() so when handling IR it extracts the function signature by splitting/locating the opening brace '{' (or using an IR-specific parser) rather than split(";")[0], ensuring argument extraction reads parameters not local IR instructions.

coderabbitai · 2026-05-06T10:42:18Z

+        if self.torch_function is None:
+            from tilelang.utils.target import is_hexagon_target
+
+            if is_hexagon_target(self.target):
+                raise RuntimeError(
+                    "Hexagon kernels cannot be executed directly on the host machine. "
+                    "To run this kernel, please use the Hexagon SDK Simulator or "
+                    "the HexagonLauncher on a supported Qualcomm device."
+                )
+
        return self.torch_function(*args, **kwds)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Raise for all uninitialized kernels, not just Hexagon.

On non-Hexagon targets this still falls through to self.torch_function(*args, **kwds), which becomes a raw 'NoneType' object is not callable'. Add the generic RuntimeError after the Hexagon-specific branch so callers always get a stable error here.

Proposed fix

if self.torch_function is None: from tilelang.utils.target import is_hexagon_target if is_hexagon_target(self.target): raise RuntimeError( "Hexagon kernels cannot be executed directly on the host machine. " "To run this kernel, please use the Hexagon SDK Simulator or " "the HexagonLauncher on a supported Qualcomm device." ) + raise RuntimeError("The compiled function is not initialized.") return self.torch_function(*args, **kwds)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if self.torch_function is None:

from tilelang.utils.target import is_hexagon_target

if is_hexagon_target(self.target):

raise RuntimeError(

"Hexagon kernels cannot be executed directly on the host machine. "

"To run this kernel, please use the Hexagon SDK Simulator or "

"the HexagonLauncher on a supported Qualcomm device."

)

return self.torch_function(*args, **kwds)

if self.torch_function is None:

from tilelang.utils.target import is_hexagon_target

if is_hexagon_target(self.target):

raise RuntimeError(

"Hexagon kernels cannot be executed directly on the host machine. "

"To run this kernel, please use the Hexagon SDK Simulator or "

"the HexagonLauncher on a supported Qualcomm device."

)

raise RuntimeError("The compiled function is not initialized.")

return self.torch_function(*args, **kwds)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tilelang/jit/kernel.py` around lines 206 - 216, The current call path may attempt to call self.torch_function when it is None; update the kernel invocation in the method containing self.torch_function to raise a clear RuntimeError for uninitialized kernels on all targets: keep the existing Hexagon-specific RuntimeError (using is_hexagon_target(self.target)) and then add a generic RuntimeError if self.torch_function is still None that explains the kernel is uninitialized and cannot be executed on the host (mentioning to use the Hexagon SDK Simulator/HexagonLauncher only in the Hexagon message). This ensures callers receive a stable, descriptive error instead of a 'NoneType' object is not callable' when invoking self.torch_function(*args, **kwds).

coderabbitai · 2026-05-06T10:42:18Z

+    # Handle Backend-Specific Normalization (Shorthands)
+    if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var:
+        return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve explicit -mcpu and other flags when normalizing Hexagon strings.

This currently rewrites any string containing "hexagon" to a fixed llvm -mtriple=hexagon -mcpu=hexagonv73. Inputs like "llvm -mcpu=hexagonv75" lose the requested CPU/features and silently downgrade codegen.

Suggested fix

# Handle Backend-Specific Normalization (Shorthands) if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var: - return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73" + normalized = return_var.strip() + if normalized == "hexagon": + return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73" + elif normalized.startswith("llvm"): + return_var = f"{normalized} -mtriple=hexagon" + else: + return_var = f"llvm -mtriple=hexagon {normalized}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Handle Backend-Specific Normalization (Shorthands)

if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var:

return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73"

# Handle Backend-Specific Normalization (Shorthands)

if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var:

normalized = return_var.strip()

if normalized == "hexagon":

return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73"

elif normalized.startswith("llvm"):

return_var = f"{normalized} -mtriple=hexagon"

else:

return_var = f"llvm -mtriple=hexagon {normalized}"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tilelang/utils/target.py` around lines 189 - 191, The normalization currently overwrites any Hexagon target string (variable return_var) with a fixed "llvm -mtriple=hexagon -mcpu=hexagonv73"; instead, update the logic in tilelang.utils.target (the block handling Hexagon shorthands) to parse the existing return_var string and only inject missing flags: if "-mtriple" is absent append "-mtriple=hexagon", if "-mcpu" is absent append a default "-mcpu=hexagonv73", and preserve any existing "-mcpu", "-mattr", or other flags already present in return_var; ensure you handle cases where return_var is exactly "hexagon" (replace with "llvm" plus the necessary flags) versus when it already starts with "llvm" (modify by appending missing flags) so no explicit user flags are overwritten.

LeiWang1999 · 2026-05-07T04:34:48Z

@Calaweh Thanks for your contributions! We're currently decoupling different backends, and after those works are wrapped up, I will take a look for this pr.

feat: add initial Hexagon architecture support and HMX MMA intrinsic …

fd34124

…lowering pipeline

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

fix: resolve pre-commit failures in diagnose_hmx.py

1a9aae6

Conversation

Calaweh commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Title: Add Hexagon Backend with HMX Support for Matrix Multiplication

Summary

Why

Key Changes

Hardware Lowering

Memory Architecture

Runtime Support

JIT Infrastructure

Validation

Additional Checks

Compatibility

Related Issues

Summary by CodeRabbit

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

LeiWang1999 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Calaweh commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading