Skip to content

[feat] Add Hexagon HMX backend support#2155

Open
Calaweh wants to merge 2 commits intotile-ai:mainfrom
Calaweh:feat/hexagon-hmx
Open

[feat] Add Hexagon HMX backend support#2155
Calaweh wants to merge 2 commits intotile-ai:mainfrom
Calaweh:feat/hexagon-hmx

Conversation

@Calaweh
Copy link
Copy Markdown

@Calaweh Calaweh commented May 6, 2026

Title: Add Hexagon Backend with HMX Support for Matrix Multiplication

Summary

Introduce a dedicated Hexagon backend for TileLang with support for Qualcomm HMX (Hexagon Matrix eXtensions). This integration enables the generation of high-performance LLVM IR specifically targeting Hexagon DSP hardware instructions for matrix multiplication.

Why

Qualcomm's Hexagon DSPs power a vast number of edge devices. By leveraging the Hexagon Kernel Library (HexKL) interfaces through TileLang, we can achieve efficient CodeGen for specialized operators like sparse attention on mobile hardware. This keeps TileLang at the forefront of heterogeneous hardware support and enables localized, backend-specific optimizations for Qualcomm chips.

Key Changes

Hardware Lowering

  • New C++ pass LowerHexagonIntrinsics that translates TileLang MMA placeholders into the specialized @HexKL_mma_i8acc32 hardware instructions.

Memory Architecture

  • Native C++ registration of global.vtcm and global.hmx.acc memory scopes.
  • Provides the compiler with correct alignment (128‑byte) and capacity constraints for Hexagon HTP.

Runtime Support

  • Runtime power‑management wrapper hmx_kernel_launch utilizing the HexagonHtp RAII guard to enable the HMX hardware block during execution.

JIT Infrastructure

  • Enhancements to the kernel cache to support IR‑only backends.
  • Addition of execution guards to prevent cross‑compiled binaries from attempting to run on incompatible host CPUs.

Validation

# Style and Linting
pre-commit run --all-files

# Build Verification (requires LLVM 17/18)
cd build
cmake .. -DUSE_LLVM=ON
make -j$(nproc)

# Symbol Verification
nm -D lib/libtilelang.so | grep "LowerHexagonIntrinsics"

# Functional Testing (IR Generation & Logic)
pytest testing/python/hexagon/test_hmx_mma.py
pytest testing/python/hexagon/diagnose_hmx.py

Additional Checks

  • IR Validation: Verified correct target triple (hexagon), memory allocations (A_vtcm, C_acc), and hardware intrinsic calls (HexKL_mma_i8acc32).
  • Host Guard: Verified that calling Hexagon kernels on x86/ARM hosts raises a descriptive RuntimeError.

Compatibility

  • Requires LLVM 17 or 18.
  • Tested on Qualcomm HTP hardware (simulator and physical device).

Related Issues

Closes: #1293

Summary by CodeRabbit

  • New Features

    • Hexagon DSP target support with HMX acceleration, intrinsics (MMA, DMA), architecture descriptors, and runtime kernel launch registration.
    • Target detection and tooling paths for Hexagon to enable cross-compilation flows.
  • Tests

    • New diagnostics and tests for Hexagon codegen, IR lowering, and HMX MMA behavior.
  • Chores

    • Build system and kernel-cache updates to accommodate Hexagon/LLVM/IR-only backends.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 307e835c-4a40-4b9e-b417-a115a3c918fd

📥 Commits

Reviewing files that changed from the base of the PR and between fd34124 and 1a9aae6.

📒 Files selected for processing (1)
  • testing/python/hexagon/diagnose_hmx.py

📝 Walkthrough

Walkthrough

This PR adds Hexagon HMX support: build and runtime wiring, Hexagon intrinsics and lowering pass, architecture descriptors and target detection, JIT/libgen adapters for IR-only cross-compilation, kernel caching updates, and tests/diagnostics for Hexagon codegen and lowering.

Changes

Hexagon Backend with HMX Support

Layer / File(s) Summary
Build / Includes
CMakeLists.txt
Adds LLVM/Hexagon detection flow, caches/restores USE_LLVM around TVM config, appends TVM_SOURCE/src to include paths, and adds src/runtime/hexagon_runtime.cc to sources.
Runtime FFI
src/runtime/hexagon_runtime.cc
Registers tilelang.hexagon.hmx_kernel_launch packed function (guarded by TILELANG_HEXAGON_ENABLED) that extracts a kernel Function, powers HMX via HexagonHtp RAII, and forwards args to the kernel.
Lowering Pass & Memory Info
src/transform/lower_hexagon_intrinsics.cc, tilelang/engine/phase.py
Adds LowerHexagonIntrinsics pass that rewrites hmx_mma_placeholder externs to HexKL form and registers HMX accumulator/VTCM MemoryInfo; wires the pass into OptimizeForTarget.
Engine Lowering Flow
tilelang/engine/lower.py
Detects Hexagon early in lower(), skips certain passes for Hexagon, applies Hexagon-specific Phase 1/opt steps, and adds _lower_hexagon_intrinsics helper to invoke the lowering hook.
Intrinsics API
tilelang/intrinsics/hexagon/__init__.py
Adds mma, mma_fp16, and vtcm_dma_copy intrinsics, provides HMXBuilder and hmx instance, and emits corresponding extern calls.
Architecture & Target Utilities
tilelang/carver/arch/hexagon.py, tilelang/carver/arch/__init__.py, tilelang/utils/target.py
Introduces HexagonArch, memory scopes, HMXTileShape and HMX_TILE_SHAPES, get_hexagon_arch factory, adds hexagon to supported targets, check_hexagon_availability(), and is_hexagon_target().
JIT / Libgen / Adapter Integration
tilelang/jit/adapter/*, tilelang/jit/adapter/wrapper.py, tilelang/jit/kernel.py, tilelang/jit/adapter/utils.py
Adds Hexagon-aware guards to skip host library loading/compilation for cross-compilation (IR-only), maps Hexagon buffers to CPU device for parsing, exposes arch property on TLWrapper, relaxes device-module assertions for Hexagon, and extends kernel-declare matching for LLVM-style declarations.
Kernel Cache
tilelang/cache/kernel_cache.py
Handles IR-only targets by excluding binary .so from required files and skipping binary cache save when adapter.libpath is None.
Tests & Diagnostics
testing/python/hexagon/diagnose_hmx.py, testing/python/hexagon/test_hmx_mma.py
Adds has_hexagon_codegen(), build_hmx_matmul() kernel factory, diagnostic tests (test_000_environment, test_001_ir_dump, test_002_hmx_lowering_status) and compilation/lowering checks plus host-execution guard tests.

Sequence Diagram

sequenceDiagram
    participant User
    participant Compiler
    participant Lowering
    participant HexIntrinsics
    participant JIT
    participant FFI
    participant HMX

    User->>Compiler: Compile kernel for Hexagon target
    Compiler->>Compiler: Detect is_hexagon_target()
    Compiler->>Lowering: lower() (Hexagon branch)
    Lowering->>Lowering: Skip incompatible passes
    Lowering->>HexIntrinsics: Invoke LowerHexagonIntrinsics
    HexIntrinsics-->>Lowering: Return lowered IR (HexKL intrinsics)
    Lowering-->>JIT: Emit IR / compile (IR-only path)
    JIT-->>User: Provide IR artifact (no host .so)
    User->>FFI: Call tilelang.hexagon.hmx_kernel_launch
    FFI->>FFI: Extract kernel Function and args
    FFI->>HMX: Enter HexagonHtp RAII (power on HMX)
    HMX->>HMX: Execute HMX MMA instructions
    HMX-->>FFI: Return results
    FFI-->>User: Deliver kernel output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • LeiWang1999
  • Gongen-Ali

Poem

🐰 I hopped to Hexagon's bright glen,
HMX tiles and VTCM then,
Intrinsics lowered, kernels spun,
Cross-compile paths now neatly done,
A carrot-powered kernel run!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.10% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[feat] Add Hexagon HMX backend support' directly and clearly summarizes the main change: introducing Hexagon HMX backend support to TileLang.
Linked Issues check ✅ Passed The PR fully addresses issue #1293 by implementing Hexagon backend with HMX support, including hardware lowering, memory architecture, runtime wrapper, and kernel cache enhancements for cross-compiled Hexagon code.
Out of Scope Changes check ✅ Passed All changes are directly in-scope for Hexagon HMX backend support: architecture definitions, intrinsics, lowering passes, JIT adapter updates, cache handling, and diagnostic tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CMakeLists.txt`:
- Around line 157-186: The build currently defines TILELANG_HEXAGON_ENABLED
whenever USE_LLVM is truthy; update the CMake logic (around the llvm-config
discovery that sets LLVM_CONFIG_PATH/USE_LLVM and the later block that sets
TILELANG_HEXAGON_ENABLED) to actually verify the LLVM toolchain has Hexagon:
after finding llvm-config (LLVM_CONFIG_PATH) run llvm-config --targets-built (or
invoke it via execute_process) and check the output contains "Hexagon"; only
then set the TILELANG_HEXAGON_ENABLED definition and the "TileLang Build with
LLVM" status message. If llvm-config is not found or it does not report Hexagon,
change the earlier non-fatal warning into a fatal error (or at minimum do not
enable TILELANG_HEXAGON_ENABLED) so hexagon_runtime.cc and related transforms
are not advertised/compiled incorrectly; refer to the LLVM_CONFIG_PATH/USE_LLVM
and TILELANG_HEXAGON_ENABLED symbols to locate where to add the execute_process
and conditional enablement.

In `@src/runtime/hexagon_runtime.cc`:
- Around line 16-23: Ensure you validate args.size() before accessing args[0] or
constructing ffi::PackedArgs to avoid UB when called with zero arguments: in the
lambda that reads args and creates kernel and kernel_args, first check that
args.size() >= 1, and if not set an appropriate runtime error/return (e.g.,
populate rv with an error or throw a tvm/ffi runtime error) instead of
proceeding; specifically modify the block that uses args[0], ffi::Function
kernel, and ffi::PackedArgs kernel_args(args.data() + 1, args.size() - 1) to
perform the size check and early error return.

In `@src/transform/lower_hexagon_intrinsics.cc`:
- Around line 28-35: The rewrite for the "hmx_mma_placeholder" intrinsic
currently indexes call->args[3] without checking arity; update the branch that
handles func_name->value == "hmx_mma_placeholder" in lower_hexagon_intrinsics.cc
to first validate call->args.size() >= 4 and only perform the Array<PrimExpr>
construction and return Evaluate(...) when that check passes; if the check
fails, preserve and return the original Call node (or emit a targeted
diagnostic/log) instead of indexing out of bounds to avoid hard runtime/compile
errors.

In `@testing/python/hexagon/diagnose_hmx.py`:
- Around line 139-141: The failure message in
testing/python/hexagon/diagnose_hmx.py incorrectly references the pass name
LowerHMXIntrinsics; change the string literal that currently says "Check
LowerHMXIntrinsics implementation." to reference the correct backend pass name
"LowerHexagonIntrinsics" so triage points to the right implementation (update
the message around the HMX placeholder/no intrinsic error text in
diagnose_hmx.py).
- Around line 9-15: The try/except in diagnose_hmx.py currently swallows all
exceptions when probing LLVM/Hexagon target (the block that calls
tvm.runtime.enabled("llvm") and tvm.target.Target(...)); replace the broad
except Exception with a narrow catch for the expected TVM probe error (e.g.,
tvm.error.TVMError or the TVM-specific probe exception available in your TVM
version, falling back to RuntimeError only if necessary) and re-raise any other
exceptions so real failures are not silenced; apply the same change to the other
probe sites referenced (the blocks around lines with the gated tests at the
other two probe locations) and ensure error handling logs or returns False only
for the known probe failure type while allowing unexpected exceptions to
propagate.

In `@tilelang/cache/kernel_cache.py`:
- Around line 457-460: The save path allows IR-only entries by checking
_is_ir_only (via kernel.adapter.libpath is None) and omitting kernel_lib_path
from missing_files, but load-time required files (method _get_required_files)
still unconditionally demands kernel_lib_path; update _get_required_files to
detect the same IR-only condition (check kernel.adapter.libpath or reuse the
_is_ir_only logic) and exclude self.kernel_lib_path from the required list when
IR-only so save and load semantics match; ensure the change references
_is_ir_only, kernel.adapter.libpath, _get_missing_complete_cache_files,
_get_required_files, and kernel_lib_path so both save and load use the same
criterion.

In `@tilelang/carver/arch/__init__.py`:
- Line 52: The symbol is_hexagon_target was added to __all__ but never defined
or imported in this module, so imports fail; fix by importing or defining
is_hexagon_target in this module and ensuring it is present in the module
globals before exporting (e.g., add a proper import statement that brings
is_hexagon_target into tilelang.carver.arch or define the function here), then
keep it listed in __all__ so from tilelang.carver.arch import is_hexagon_target
and star-imports succeed.

In `@tilelang/intrinsics/hexagon/__init__.py`:
- Around line 94-101: The export list __all__ references
register_hexagon_memory_info which is not defined or imported, causing
import-time failures; fix by either importing the symbol into this module (e.g.,
add an import that provides register_hexagon_memory_info) or remove it from the
__all__ list so only existing names (hmx, HMXBuilder, mma, mma_fp16,
vtcm_dma_copy) are exported; update the module’s top-level imports or the
__all__ array accordingly and ensure any chosen import refers to the correct
source that defines register_hexagon_memory_info.
- Around line 53-67: The vtcm_dma_copy function currently constructs a
tir.Evaluate node but never emits it into the active TIR builder; change it to
call T.evaluate(...) (like the mma() helper does) so the extern DMA call is
actually inserted into the TIR; ensure tilelang.language is imported as T and
replace the tir.Evaluate(...) invocation inside vtcm_dma_copy with
T.evaluate(...) while keeping the tir.call_extern arguments (hexagon_dma_copy,
src.access_ptr("r"), dst.access_ptr("w")) unchanged.

In `@tilelang/jit/adapter/cython/adapter.py`:
- Around line 134-142: The Hexagon guard (is_hexagon_target(self.target)) runs
after the code that compiles/loads the generated library; move this check to run
before any compile/load steps so cross-compilation never attempts to load
host-incompatible artifacts—i.e., in the method containing the compile/load
logic, call is_hexagon_target(self.target) at the very start, return early and
set self._compiled_func = None (leaving kernel.kernel_source etc. intact) so the
JIT object can be inspected without performing host library loading.

In `@tilelang/jit/adapter/utils.py`:
- Around line 88-105: match_declare_kernel_cpu currently ignores the requested
symbol and always returns the first int32_t/define; detect when the caller
passed a function name (create_call_func passes "function_name(" into the
annotation param), extract the function name (e.g., annotation.split("(")[0] if
"(" present), then build the C and LLVM search patterns to match that exact
function (C: use a regex like r"\b<int_return>?\s+{re.escape(func_name)}\b" or
simply search for the function name with a preceding return/type token, LLVM:
r"define\s+.*@{re.escape(func_name)}\b"), use re.search on each line and return
source.index(match.group(0)) for the found match; if annotation is not a
function name keep the existing behavior but make patterns use word boundaries
so you don't pick substrings. Ensure you update match_declare_kernel_cpu to
reference the extracted func_name and to return the correct match start.

In `@tilelang/jit/adapter/wrapper.py`:
- Around line 988-992: The Hexagon target is being routed to TLCPUSourceWrapper
which expects C prototypes, but Hexagon emits LLVM IR; update the target routing
so Hexagon is parsed as IR instead of C: change the branch that checks
is_hexagon_target(self.target) (in the wrapper selection near
TLCPUSourceWrapper) to route Hexagon to an IR-aware wrapper (e.g., a new or
existing TLCPUIRWrapper) or enhance TLCPUSourceWrapper to detect LLVM IR and
parse declarations accordingly; also update create_call_func() so when handling
IR it extracts the function signature by splitting/locating the opening brace
'{' (or using an IR-specific parser) rather than split(";")[0], ensuring
argument extraction reads parameters not local IR instructions.

In `@tilelang/jit/kernel.py`:
- Around line 206-216: The current call path may attempt to call
self.torch_function when it is None; update the kernel invocation in the method
containing self.torch_function to raise a clear RuntimeError for uninitialized
kernels on all targets: keep the existing Hexagon-specific RuntimeError (using
is_hexagon_target(self.target)) and then add a generic RuntimeError if
self.torch_function is still None that explains the kernel is uninitialized and
cannot be executed on the host (mentioning to use the Hexagon SDK
Simulator/HexagonLauncher only in the Hexagon message). This ensures callers
receive a stable, descriptive error instead of a 'NoneType' object is not
callable' when invoking self.torch_function(*args, **kwds).

In `@tilelang/utils/target.py`:
- Around line 189-191: The normalization currently overwrites any Hexagon target
string (variable return_var) with a fixed "llvm -mtriple=hexagon
-mcpu=hexagonv73"; instead, update the logic in tilelang.utils.target (the block
handling Hexagon shorthands) to parse the existing return_var string and only
inject missing flags: if "-mtriple" is absent append "-mtriple=hexagon", if
"-mcpu" is absent append a default "-mcpu=hexagonv73", and preserve any existing
"-mcpu", "-mattr", or other flags already present in return_var; ensure you
handle cases where return_var is exactly "hexagon" (replace with "llvm" plus the
necessary flags) versus when it already starts with "llvm" (modify by appending
missing flags) so no explicit user flags are overwritten.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 05bd7958-0e49-4313-b7e6-8667f85194cb

📥 Commits

Reviewing files that changed from the base of the PR and between 48998d2 and fd34124.

📒 Files selected for processing (18)
  • CMakeLists.txt
  • src/runtime/hexagon_runtime.cc
  • src/transform/lower_hexagon_intrinsics.cc
  • testing/python/hexagon/diagnose_hmx.py
  • testing/python/hexagon/test_hmx_mma.py
  • tilelang/cache/kernel_cache.py
  • tilelang/carver/arch/__init__.py
  • tilelang/carver/arch/hexagon.py
  • tilelang/engine/lower.py
  • tilelang/engine/phase.py
  • tilelang/intrinsics/hexagon/__init__.py
  • tilelang/jit/adapter/base.py
  • tilelang/jit/adapter/cython/adapter.py
  • tilelang/jit/adapter/libgen.py
  • tilelang/jit/adapter/utils.py
  • tilelang/jit/adapter/wrapper.py
  • tilelang/jit/kernel.py
  • tilelang/utils/target.py

Comment thread CMakeLists.txt
Comment on lines +157 to +186
if(USE_LLVM STREQUAL "ON")
find_program(LLVM_CONFIG_PATH NAMES llvm-config-18 llvm-config-17 llvm-config)
if(LLVM_CONFIG_PATH)
set(USE_LLVM ${LLVM_CONFIG_PATH} CACHE STRING "Path to llvm-config" FORCE)
else()
message(WARNING "USE_LLVM is ON but llvm-config was not found in PATH.")
endif()
endif()

set(_TILELANG_USE_LLVM_SAVED ${USE_LLVM})

# Load TVM sources and base configs
include(cmake/load_tvm.cmake)

if(EXISTS ${TVM_SOURCE}/cmake/config.cmake)
include(${TVM_SOURCE}/cmake/config.cmake)
else()
message(FATAL_ERROR "Nor tvm provided or submodule checkout-ed.")
endif()

# Restore USE_LLVM — TVM's config.cmake may have reset it
if(_TILELANG_USE_LLVM_SAVED AND NOT _TILELANG_USE_LLVM_SAVED STREQUAL "OFF")
set(USE_LLVM ${_TILELANG_USE_LLVM_SAVED} CACHE STRING "LLVM config path" FORCE)
message(STATUS "TileLang: restored USE_LLVM=${USE_LLVM}")
endif()
unset(_TILELANG_USE_LLVM_SAVED)

if(USE_LLVM AND NOT USE_LLVM STREQUAL "OFF")
message(STATUS "TileLang Build with LLVM: ${USE_LLVM}")
add_definitions(-DTILELANG_HEXAGON_ENABLED)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

for bin in llvm-config-18 llvm-config-17 llvm-config; do
  if command -v "$bin" >/dev/null 2>&1; then
    "$bin" --version
    "$bin" --targets-built
    exit 0
  fi
done

echo "llvm-config not found in PATH" >&2
exit 1

Repository: tile-ai/tilelang

Length of output: 90


🏁 Script executed:

# First, locate and examine the CMakeLists.txt file
find . -name "CMakeLists.txt" -type f | head -20

Repository: tile-ai/tilelang

Length of output: 180


🏁 Script executed:

# Once found, examine the relevant lines (157-186)
sed -n '157,186p' CMakeLists.txt

Repository: tile-ai/tilelang

Length of output: 1124


🏁 Script executed:

# Search for TILELANG_HEXAGON_ENABLED usage in the codebase
rg -r "TILELANG_HEXAGON_ENABLED" --type cmake --type cpp --type h

Repository: tile-ai/tilelang

Length of output: 457


🏁 Script executed:

# Check for any other LLVM/Hexagon related configurations
rg -i "hexagon" --type cmake --type cpp --type h -B 2 -A 2

Repository: tile-ai/tilelang

Length of output: 5477


🏁 Script executed:

# Look for any cmake configuration files related to LLVM setup
find . -path "*/cmake/*.cmake" -type f | head -20

Repository: tile-ai/tilelang

Length of output: 145


Hexagon support is enabled without verifying LLVM has the Hexagon backend.

When USE_LLVM is set to any truthy value other than "OFF", TILELANG_HEXAGON_ENABLED is unconditionally defined. This happens even if:

  • llvm-config was not found (only a warning is issued on line 162)
  • The discovered LLVM lacks the Hexagon backend

Since hexagon_runtime.cc and Hexagon-specific transforms depend on this definition, the build can advertise Hexagon support and fail later in less obvious places.

Query llvm-config --targets-built after locating it, and only enable Hexagon support if the output contains Hexagon. Additionally, consider making the missing llvm-config a fatal error since Hexagon requires it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CMakeLists.txt` around lines 157 - 186, The build currently defines
TILELANG_HEXAGON_ENABLED whenever USE_LLVM is truthy; update the CMake logic
(around the llvm-config discovery that sets LLVM_CONFIG_PATH/USE_LLVM and the
later block that sets TILELANG_HEXAGON_ENABLED) to actually verify the LLVM
toolchain has Hexagon: after finding llvm-config (LLVM_CONFIG_PATH) run
llvm-config --targets-built (or invoke it via execute_process) and check the
output contains "Hexagon"; only then set the TILELANG_HEXAGON_ENABLED definition
and the "TileLang Build with LLVM" status message. If llvm-config is not found
or it does not report Hexagon, change the earlier non-fatal warning into a fatal
error (or at minimum do not enable TILELANG_HEXAGON_ENABLED) so
hexagon_runtime.cc and related transforms are not advertised/compiled
incorrectly; refer to the LLVM_CONFIG_PATH/USE_LLVM and TILELANG_HEXAGON_ENABLED
symbols to locate where to add the execute_process and conditional enablement.

Comment on lines +16 to +23
[](ffi::PackedArgs args, ffi::Any *rv) {
// args[0] is the kernel Function; remaining args are forwarded to it.
// AnyView supports .cast<T>() for type-safe extraction.
ffi::Function kernel = args[0].cast<ffi::Function>();

// PackedArgs(const AnyView* data, int32_t size) — slice past the first
// arg. args.data() returns const AnyView*, args.size() returns int32_t.
ffi::PackedArgs kernel_args(args.data() + 1, args.size() - 1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate args.size() before reading args[0] and slicing the tail.

This packed function is globally callable from Python/C++. An empty call will access args[0] and build PackedArgs(..., -1) before the HMX guard runs, so the failure mode is much harsher than a normal runtime error.

Proposed fix
       "tilelang.hexagon.hmx_kernel_launch",
       [](ffi::PackedArgs args, ffi::Any *rv) {
+        ICHECK_GE(args.size(), 1)
+            << "tilelang.hexagon.hmx_kernel_launch expects a kernel function as arg0";
         // args[0] is the kernel Function; remaining args are forwarded to it.
         // AnyView supports .cast<T>() for type-safe extraction.
         ffi::Function kernel = args[0].cast<ffi::Function>();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[](ffi::PackedArgs args, ffi::Any *rv) {
// args[0] is the kernel Function; remaining args are forwarded to it.
// AnyView supports .cast<T>() for type-safe extraction.
ffi::Function kernel = args[0].cast<ffi::Function>();
// PackedArgs(const AnyView* data, int32_t size) — slice past the first
// arg. args.data() returns const AnyView*, args.size() returns int32_t.
ffi::PackedArgs kernel_args(args.data() + 1, args.size() - 1);
[](ffi::PackedArgs args, ffi::Any *rv) {
ICHECK_GE(args.size(), 1)
<< "tilelang.hexagon.hmx_kernel_launch expects a kernel function as arg0";
// args[0] is the kernel Function; remaining args are forwarded to it.
// AnyView supports .cast<T>() for type-safe extraction.
ffi::Function kernel = args[0].cast<ffi::Function>();
// PackedArgs(const AnyView* data, int32_t size) — slice past the first
// arg. args.data() returns const AnyView*, args.size() returns int32_t.
ffi::PackedArgs kernel_args(args.data() + 1, args.size() - 1);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/hexagon_runtime.cc` around lines 16 - 23, Ensure you validate
args.size() before accessing args[0] or constructing ffi::PackedArgs to avoid UB
when called with zero arguments: in the lambda that reads args and creates
kernel and kernel_args, first check that args.size() >= 1, and if not set an
appropriate runtime error/return (e.g., populate rv with an error or throw a
tvm/ffi runtime error) instead of proceeding; specifically modify the block that
uses args[0], ffi::Function kernel, and ffi::PackedArgs kernel_args(args.data()
+ 1, args.size() - 1) to perform the size check and early error return.

Comment on lines +28 to +35
if (func_name->value == "hmx_mma_placeholder") {
Array<PrimExpr> new_args;
new_args.push_back(StringImm("HexKL_mma_i8acc32"));
new_args.push_back(
call->args[3]); // C_acc (accumulator — first arg to HexKL)
new_args.push_back(call->args[1]); // A_vtcm
new_args.push_back(call->args[2]); // B_vtcm
return Evaluate(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard the placeholder arity before indexing call->args[3].

This pass is globally callable on arbitrary TIR, so a malformed call_extern("hmx_mma_placeholder", ...) with fewer than four arguments will trip TVM's bounds checks and fail the compile with a hard error. Validate the expected operand count before rewriting, then either keep the node unchanged or emit a targeted diagnostic.

Proposed fix
           // Lower HMX MMA placeholder
           if (func_name->value == "hmx_mma_placeholder") {
+            ICHECK_EQ(call->args.size(), 4)
+                << "hmx_mma_placeholder expects exactly 3 operands";
             Array<PrimExpr> new_args;
             new_args.push_back(StringImm("HexKL_mma_i8acc32"));
             new_args.push_back(
                 call->args[3]); // C_acc (accumulator — first arg to HexKL)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (func_name->value == "hmx_mma_placeholder") {
Array<PrimExpr> new_args;
new_args.push_back(StringImm("HexKL_mma_i8acc32"));
new_args.push_back(
call->args[3]); // C_acc (accumulator — first arg to HexKL)
new_args.push_back(call->args[1]); // A_vtcm
new_args.push_back(call->args[2]); // B_vtcm
return Evaluate(
if (func_name->value == "hmx_mma_placeholder") {
ICHECK_EQ(call->args.size(), 4)
<< "hmx_mma_placeholder expects exactly 3 operands";
Array<PrimExpr> new_args;
new_args.push_back(StringImm("HexKL_mma_i8acc32"));
new_args.push_back(
call->args[3]); // C_acc (accumulator — first arg to HexKL)
new_args.push_back(call->args[1]); // A_vtcm
new_args.push_back(call->args[2]); // B_vtcm
return Evaluate(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/transform/lower_hexagon_intrinsics.cc` around lines 28 - 35, The rewrite
for the "hmx_mma_placeholder" intrinsic currently indexes call->args[3] without
checking arity; update the branch that handles func_name->value ==
"hmx_mma_placeholder" in lower_hexagon_intrinsics.cc to first validate
call->args.size() >= 4 and only perform the Array<PrimExpr> construction and
return Evaluate(...) when that check passes; if the check fails, preserve and
return the original Call node (or emit a targeted diagnostic/log) instead of
indexing out of bounds to avoid hard runtime/compile errors.

Comment on lines +9 to +15
try:
if not tvm.runtime.enabled("llvm"):
return False
tvm.target.Target("llvm -mtriple=hexagon -mcpu=hexagonv73")
return True
except Exception:
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cat -n testing/python/hexagon/diagnose_hmx.py | head -120

Repository: tile-ai/tilelang

Length of output: 5134


🏁 Script executed:

cat -n testing/python/hexagon/diagnose_hmx.py | sed -n '110,150p'

Repository: tile-ai/tilelang

Length of output: 1775


🏁 Script executed:

# Search for LowerHMXIntrinsics and LowerHexagonIntrinsics in the codebase
rg "LowerHMXIntrinsics|LowerHexagonIntrinsics" -i

Repository: tile-ai/tilelang

Length of output: 793


🏁 Script executed:

# Check the lower.py file to see what exceptions might be expected
cat -n tilelang/engine/lower.py | head -60

Repository: tile-ai/tilelang

Length of output: 2467


🏁 Script executed:

# Check what exceptions TVM Target raises by looking for its usage and error handling
rg "tvm.target.Target" -A 3 -B 1 | head -80

Repository: tile-ai/tilelang

Length of output: 5048


🏁 Script executed:

# Look for TVM exception types and how Target errors are typically caught
rg "except.*Error|except.*Exception" testing/python/hexagon/diagnose_hmx.py -B 3

Repository: tile-ai/tilelang

Length of output: 358


🏁 Script executed:

# Check if there are TVM-specific exception types in the codebase
rg "from tvm|import.*tvm" testing/python/hexagon/diagnose_hmx.py | head -10

Repository: tile-ai/tilelang

Length of output: 92


🏁 Script executed:

# Check TVM exception handling patterns in the codebase
rg "tvm.error|TVMError|from tvm.*import|except.*as e" -A 2 | head -100

Repository: tile-ai/tilelang

Length of output: 5401


🏁 Script executed:

# Check the diagnostic test more carefully - what's the context around line 141?
cat -n testing/python/hexagon/diagnose_hmx.py | sed -n '135,145p'

Repository: tile-ai/tilelang

Length of output: 561


🏁 Script executed:

# Verify the actual pass name references
rg "LowerHMXIntrinsics|LowerHexagonIntrinsics" -B 2 -A 2

Repository: tile-ai/tilelang

Length of output: 2746


🏁 Script executed:

# Check if there are similar skip-gate functions elsewhere in the codebase with better exception handling
rg "pytest.mark.skipif" -B 2 -A 2 | head -60

Repository: tile-ai/tilelang

Length of output: 4792


Narrow the skip-gate exception handling to prevent silent skips.

Line 14 catches every exception and returns False, which can hide real regressions and silently skip both gated tests (Lines 64 and 110). Catch only expected probe failures and re-raise unexpected exceptions.

Suggested fix
 def has_hexagon_codegen():
     try:
         if not tvm.runtime.enabled("llvm"):
             return False
         tvm.target.Target("llvm -mtriple=hexagon -mcpu=hexagonv73")
         return True
-    except Exception:
-        return False
+    except Exception as err:
+        msg = str(err).lower()
+        # Expected probe failures: missing Hexagon/LLVM target support.
+        if "hexagon" in msg or "llvm" in msg or "target" in msg:
+            return False
+        raise

Please verify and, if available in your TVM version, prefer a concrete TVM exception type (e.g., TVM-specific error class) over message matching.

Also applies to: 64-65, 110-111

🧰 Tools
🪛 Ruff (0.15.12)

[warning] 14-14: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@testing/python/hexagon/diagnose_hmx.py` around lines 9 - 15, The try/except
in diagnose_hmx.py currently swallows all exceptions when probing LLVM/Hexagon
target (the block that calls tvm.runtime.enabled("llvm") and
tvm.target.Target(...)); replace the broad except Exception with a narrow catch
for the expected TVM probe error (e.g., tvm.error.TVMError or the TVM-specific
probe exception available in your TVM version, falling back to RuntimeError only
if necessary) and re-raise any other exceptions so real failures are not
silenced; apply the same change to the other probe sites referenced (the blocks
around lines with the gated tests at the other two probe locations) and ensure
error handling logs or returns False only for the known probe failure type while
allowing unexpected exceptions to propagate.

Comment thread testing/python/hexagon/diagnose_hmx.py
Comment on lines +134 to +142
from tilelang.utils.target import is_hexagon_target

if is_hexagon_target(self.target):
# For Hexagon, we are cross-compiling.
# We cannot load symbols or execute this on the host machine.
# Returning early allows the JIT object to exist so we can
# inspect kernel.kernel_source.
self._compiled_func = None
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return before compiling/loading Hexagon artifacts.

By the time this guard runs, Lines 130-132 have already compiled and loaded the generated library. That's the host-incompatible step this branch is supposed to skip for cross-compiled Hexagon kernels.

Suggested fix
         self.wrapper.assign_host_module(host_mod)
         self.wrapper.assign_device_module(device_mod)
         self.host_kernel_source = self.wrapper.wrap(self.get_kernel_source(kernel_only=True))
 
+        from tilelang.utils.target import is_hexagon_target
+
+        if is_hexagon_target(self.target):
+            # For Hexagon, we are cross-compiling.
+            # We cannot load symbols or execute this on the host machine.
+            # Returning early allows the JIT object to exist so we can
+            # inspect kernel.kernel_source.
+            self._compiled_func = None
+            return
+
         self.lib_generator.update_lib_code(self.host_kernel_source)
         self.lib_generator.compile_lib()
         self.lib = self.lib_generator.load_lib()
-
-        from tilelang.utils.target import is_hexagon_target
-
-        if is_hexagon_target(self.target):
-            # For Hexagon, we are cross-compiling.
-            # We cannot load symbols or execute this on the host machine.
-            # Returning early allows the JIT object to exist so we can
-            # inspect kernel.kernel_source.
-            self._compiled_func = None
-            return
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tilelang/jit/adapter/cython/adapter.py` around lines 134 - 142, The Hexagon
guard (is_hexagon_target(self.target)) runs after the code that compiles/loads
the generated library; move this check to run before any compile/load steps so
cross-compilation never attempts to load host-incompatible artifacts—i.e., in
the method containing the compile/load logic, call
is_hexagon_target(self.target) at the very start, return early and set
self._compiled_func = None (leaving kernel.kernel_source etc. intact) so the JIT
object can be inspected without performing host library loading.

Comment on lines 88 to 105
def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int:
pattern = r"int32_t\s+\w+"
# C-style signature
pattern_c = r"int32_t\s+\w+"
# LLVM-style signature
pattern_llvm = r"define\s+.*@(?!llvm\.)\w+"

for line in source.split("\n"):
if annotation in line:
matched = re.findall(pattern, line)
if len(matched) >= 1:
return source.index(matched[0] + "(")
# C pattern
matched = re.findall(pattern_c, line)
if matched:
return source.index(matched[0])
# LLVM pattern
matched = re.findall(pattern_llvm, line)
if matched:
# Match the start of the 'define'
return source.index(matched[0])

raise ValueError("No global kernel found in the source code")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Match the requested symbol instead of the first CPU declaration.

annotation is ignored here now, so every lookup returns the first int32_t/define in the file. TLCPUSourceWrapper.create_call_func() passes function_name + "(" into this helper, so multi-function CPU/LLVM sources will bind later wrappers to the wrong declaration.

Suggested fix
 def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int:
-    # C-style signature
-    pattern_c = r"int32_t\s+\w+"
-    # LLVM-style signature
-    pattern_llvm = r"define\s+.*@(?!llvm\.)\w+"
+    func_name = annotation[:-1] if annotation.endswith("(") else None
+    if func_name and func_name != "int32_t":
+        pattern_c = rf"\bint32_t\s+{re.escape(func_name)}\b"
+        pattern_llvm = rf"\bdefine\b[^\n@]*@{re.escape(func_name)}\b"
+    else:
+        pattern_c = r"\bint32_t\s+\w+\b"
+        pattern_llvm = r"\bdefine\b[^\n@]*@(?!llvm\.)\w+\b"
 
     for line in source.split("\n"):
         # C pattern
         matched = re.findall(pattern_c, line)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int:
pattern = r"int32_t\s+\w+"
# C-style signature
pattern_c = r"int32_t\s+\w+"
# LLVM-style signature
pattern_llvm = r"define\s+.*@(?!llvm\.)\w+"
for line in source.split("\n"):
if annotation in line:
matched = re.findall(pattern, line)
if len(matched) >= 1:
return source.index(matched[0] + "(")
# C pattern
matched = re.findall(pattern_c, line)
if matched:
return source.index(matched[0])
# LLVM pattern
matched = re.findall(pattern_llvm, line)
if matched:
# Match the start of the 'define'
return source.index(matched[0])
raise ValueError("No global kernel found in the source code")
def match_declare_kernel_cpu(source: str, annotation: str = "int32_t") -> int:
func_name = annotation[:-1] if annotation.endswith("(") else None
if func_name and func_name != "int32_t":
pattern_c = rf"\bint32_t\s+{re.escape(func_name)}\b"
pattern_llvm = rf"\bdefine\b[^\n@]*@{re.escape(func_name)}\b"
else:
pattern_c = r"\bint32_t\s+\w+\b"
pattern_llvm = r"\bdefine\b[^\n@]*@(?!llvm\.)\w+\b"
for line in source.split("\n"):
# C pattern
matched = re.findall(pattern_c, line)
if matched:
return source.index(matched[0])
# LLVM pattern
matched = re.findall(pattern_llvm, line)
if matched:
# Match the start of the 'define'
return source.index(matched[0])
raise ValueError("No global kernel found in the source code")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tilelang/jit/adapter/utils.py` around lines 88 - 105,
match_declare_kernel_cpu currently ignores the requested symbol and always
returns the first int32_t/define; detect when the caller passed a function name
(create_call_func passes "function_name(" into the annotation param), extract
the function name (e.g., annotation.split("(")[0] if "(" present), then build
the C and LLVM search patterns to match that exact function (C: use a regex like
r"\b<int_return>?\s+{re.escape(func_name)}\b" or simply search for the function
name with a preceding return/type token, LLVM:
r"define\s+.*@{re.escape(func_name)}\b"), use re.search on each line and return
source.index(match.group(0)) for the found match; if annotation is not a
function name keep the existing behavior but make patterns use word boundaries
so you don't pick substrings. Ensure you update match_declare_kernel_cpu to
reference the extracted func_name and to return the correct match start.

Comment on lines +988 to +992
elif is_cpu_target(self.target) or is_hexagon_target(self.target):
wrapper_class = TLCPUSourceWrapper
else:
raise ValueError(f"Unsupported platform: {self.arch.platform}")
arch_name = self.arch.platform if self.arch else "unknown"
raise ValueError(f"Unsupported platform: {arch_name}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the wrapper.py file and the specific lines mentioned
find . -type f -name "wrapper.py" | grep -E "jit/adapter"

Repository: tile-ai/tilelang

Length of output: 176


🏁 Script executed:

# Get the file size and content around lines 988-992
wc -l tilelang/jit/adapter/wrapper.py
sed -n '980,1000p' tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 1006


🏁 Script executed:

# Now let's find the create_call_func method in TLCPUSourceWrapper
grep -n "create_call_func" tilelang/jit/adapter/wrapper.py | head -20

Repository: tile-ai/tilelang

Length of output: 200


🏁 Script executed:

# Search for TLCPUSourceWrapper class definition and create_call_func implementation
grep -n "class TLCPUSourceWrapper" tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 90


🏁 Script executed:

# Search for match_declare_kernel_cpu in utils.py
grep -n "match_declare_kernel_cpu" tilelang/jit/adapter/utils.py

Repository: tile-ai/tilelang

Length of output: 142


🏁 Script executed:

# Read the create_call_func method implementation
sed -n '790,850p' tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 3011


🏁 Script executed:

# Read the match_declare_kernel_cpu function
sed -n '88,120p' tilelang/jit/adapter/utils.py

Repository: tile-ai/tilelang

Length of output: 982


🏁 Script executed:

# Check what type of source code is passed to TLCPUSourceWrapper
# Look at the context where wrapper_class is instantiated
sed -n '975,1010p' tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 1442


🏁 Script executed:

# Search for Hexagon-related code generation and what source format it produces
rg "hexagon" -i tilelang/jit/adapter/ -A 3 -B 3

Repository: tile-ai/tilelang

Length of output: 5418


🏁 Script executed:

# Look for where the source code is generated and what format it takes for Hexagon
rg "is_hexagon_target" -i tilelang/jit/ -A 5 -B 5

Repository: tile-ai/tilelang

Length of output: 6891


🏁 Script executed:

# Check if there's any LLVM IR being passed for Hexagon
grep -r "llvm" tilelang/jit/adapter/ -i

Repository: tile-ai/tilelang

Length of output: 327


🏁 Script executed:

# Let's trace back - look at what generates c_source that's passed to wrap()
grep -n "def wrap" tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 181


🏁 Script executed:

# See where wrap() is called and what source is passed
rg "\.wrap\(" tilelang/jit/ -B 5

Repository: tile-ai/tilelang

Length of output: 1749


🏁 Script executed:

# Check what get_kernel_source returns and what format Hexagon uses
grep -n "get_kernel_source" tilelang/jit/adapter/ -r -A 3

Repository: tile-ai/tilelang

Length of output: 2051


🏁 Script executed:

# Look for Hexagon-specific code generation
find tilelang -path "*hexagon*" -type f | head -20

Repository: tile-ai/tilelang

Length of output: 132


🏁 Script executed:

# Check what kind of source TVM Hexagon backend produces
grep -n "llvm" tilelang/jit/adapter/wrapper.py -i

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

# Look at what happens in the Cython adapter for get_kernel_source
grep -n "def get_kernel_source" tilelang/jit/adapter/ -r -A 15

Repository: tile-ai/tilelang

Length of output: 5063


🏁 Script executed:

# Check what code format TVM's Hexagon target produces
grep -n "hexagon" tilelang/jit/adapter/tvm_ffi.py -i -B 3 -A 3

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

# Look at the cython adapter to understand the flow for Hexagon
sed -n '100,135p' tilelang/jit/adapter/cython/adapter.py

Repository: tile-ai/tilelang

Length of output: 1623


🏁 Script executed:

# Check TVM target creation for Hexagon
grep -r "Target\|target" tilelang/jit/adapter/cython/adapter.py | grep -i hexagon

Repository: tile-ai/tilelang

Length of output: 240


🏁 Script executed:

# Look for what TVM codegen produces - check if there's any indication of LLVM vs C
grep -n "device_kernel_source" tilelang/jit/adapter/cython/adapter.py | head -10

Repository: tile-ai/tilelang

Length of output: 656


🏁 Script executed:

# Let's check if Hexagon can produce LLVM IR by examining TVM integration
grep -n "ir_module\|IRModule" tilelang/jit/adapter/cython/adapter.py -B 2 -A 2 | head -40

Repository: tile-ai/tilelang

Length of output: 1633


🏁 Script executed:

# Check what format TVM produces for different targets
# Look for codegen or backend specifications
find tilelang -name "*.py" | xargs grep -l "hexagon" | head -5

Repository: tile-ai/tilelang

Length of output: 219


🏁 Script executed:

# Check the hexagon arch file to understand what source it processes
sed -n '1,50p' tilelang/carver/arch/hexagon.py

Repository: tile-ai/tilelang

Length of output: 2022


🏁 Script executed:

# Look for test cases or examples that use Hexagon to see what source format is expected
find . -path "*/test*" -name "*.py" | xargs grep -l "hexagon" 2>/dev/null | head -3

Repository: tile-ai/tilelang

Length of output: 142


🏁 Script executed:

# Check what code is generated by looking at a simple example or test
# Look for inspect_source usage with Hexagon
grep -r "inspect_source\|get_device_source" tilelang/jit/ -B 3 -A 3 | head -40

Repository: tile-ai/tilelang

Length of output: 2068


🏁 Script executed:

# The key insight: Let's check if there's any LLVM IR handling elsewhere in CPU wrapper
grep -n "LLVM\|llvm\|define" tilelang/jit/adapter/wrapper.py

Repository: tile-ai/tilelang

Length of output: 91


🏁 Script executed:

# Check test cases to see what format Hexagon source is expected to be
sed -n '1,100p' ./testing/python/hexagon/test_hmx_mma.py

Repository: tile-ai/tilelang

Length of output: 3388


🏁 Script executed:

# Check what TVM produces for Hexagon - look at lower.py
grep -n "hexagon\|llvm" tilelang/engine/lower.py -i -B 2 -A 2 | head -50

Repository: tile-ai/tilelang

Length of output: 1992


🏁 Script executed:

# The critical test: what does inspect_source() return for a Hexagon target?
# Let's check if there's documentation or examples
grep -r "inspect_source" tilelang/ --include="*.py" -B 1 -A 1 | grep -i "hexagon\|llvm" -B 2 -A 2

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

# Let's check what TVM's Hexagon target backend is set to
grep -n "hexagon\|llvm" tilelang/utils/target.py -i -B 2 -A 2

Repository: tile-ai/tilelang

Length of output: 2237


🏁 Script executed:

# Confirm that split(";") will fail on LLVM IR
python3 << 'EOF'
# Example LLVM function
llvm_func = '''define i32 `@kernel`(i32* %arg0, i32 %arg1) {
  %1 = load i32, i32* %arg0
  %2 = add i32 %1, %arg1
  ret i32 %2
}'''

# What happens with split(";")[0]
declaration = llvm_func.split(";")[0]
print("LLVM function:")
print(llvm_func)
print("\nResult of split(';')[0]:")
print(declaration)
print("\nThis includes the entire function body, not just the declaration!")

# Example C function
c_func = '''int32_t kernel(int32_t* arg0, int32_t arg1);
{
  // body
}'''

declaration_c = c_func.split(";")[0]
print("\n\nC function:")
print(c_func)
print("\nResult of split(';')[0]:")
print(declaration_c)
print("\nThis correctly stops at the declaration!")
EOF

Repository: tile-ai/tilelang

Length of output: 577


TLCPUSourceWrapper parses C prototypes, not LLVM IR.

This branch routes Hexagon (which generates LLVM IR, not C code) through TLCPUSourceWrapper. However, create_call_func() still extracts declarations with split(";")[0], which fails for LLVM IR—LLVM uses braces, not semicolons. This causes the parser to include the entire function body in the "declaration", and argument extraction will then match locals instead of parameters, producing malformed function calls. Update the CPU wrapper to parse LLVM IR before routing Hexagon here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tilelang/jit/adapter/wrapper.py` around lines 988 - 992, The Hexagon target
is being routed to TLCPUSourceWrapper which expects C prototypes, but Hexagon
emits LLVM IR; update the target routing so Hexagon is parsed as IR instead of
C: change the branch that checks is_hexagon_target(self.target) (in the wrapper
selection near TLCPUSourceWrapper) to route Hexagon to an IR-aware wrapper
(e.g., a new or existing TLCPUIRWrapper) or enhance TLCPUSourceWrapper to detect
LLVM IR and parse declarations accordingly; also update create_call_func() so
when handling IR it extracts the function signature by splitting/locating the
opening brace '{' (or using an IR-specific parser) rather than split(";")[0],
ensuring argument extraction reads parameters not local IR instructions.

Comment thread tilelang/jit/kernel.py
Comment on lines +206 to 216
if self.torch_function is None:
from tilelang.utils.target import is_hexagon_target

if is_hexagon_target(self.target):
raise RuntimeError(
"Hexagon kernels cannot be executed directly on the host machine. "
"To run this kernel, please use the Hexagon SDK Simulator or "
"the HexagonLauncher on a supported Qualcomm device."
)

return self.torch_function(*args, **kwds)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Raise for all uninitialized kernels, not just Hexagon.

On non-Hexagon targets this still falls through to self.torch_function(*args, **kwds), which becomes a raw 'NoneType' object is not callable'. Add the generic RuntimeError after the Hexagon-specific branch so callers always get a stable error here.

Proposed fix
         if self.torch_function is None:
             from tilelang.utils.target import is_hexagon_target
 
             if is_hexagon_target(self.target):
                 raise RuntimeError(
                     "Hexagon kernels cannot be executed directly on the host machine. "
                     "To run this kernel, please use the Hexagon SDK Simulator or "
                     "the HexagonLauncher on a supported Qualcomm device."
                 )
+            raise RuntimeError("The compiled function is not initialized.")
 
         return self.torch_function(*args, **kwds)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self.torch_function is None:
from tilelang.utils.target import is_hexagon_target
if is_hexagon_target(self.target):
raise RuntimeError(
"Hexagon kernels cannot be executed directly on the host machine. "
"To run this kernel, please use the Hexagon SDK Simulator or "
"the HexagonLauncher on a supported Qualcomm device."
)
return self.torch_function(*args, **kwds)
if self.torch_function is None:
from tilelang.utils.target import is_hexagon_target
if is_hexagon_target(self.target):
raise RuntimeError(
"Hexagon kernels cannot be executed directly on the host machine. "
"To run this kernel, please use the Hexagon SDK Simulator or "
"the HexagonLauncher on a supported Qualcomm device."
)
raise RuntimeError("The compiled function is not initialized.")
return self.torch_function(*args, **kwds)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tilelang/jit/kernel.py` around lines 206 - 216, The current call path may
attempt to call self.torch_function when it is None; update the kernel
invocation in the method containing self.torch_function to raise a clear
RuntimeError for uninitialized kernels on all targets: keep the existing
Hexagon-specific RuntimeError (using is_hexagon_target(self.target)) and then
add a generic RuntimeError if self.torch_function is still None that explains
the kernel is uninitialized and cannot be executed on the host (mentioning to
use the Hexagon SDK Simulator/HexagonLauncher only in the Hexagon message). This
ensures callers receive a stable, descriptive error instead of a 'NoneType'
object is not callable' when invoking self.torch_function(*args, **kwds).

Comment thread tilelang/utils/target.py
Comment on lines +189 to +191
# Handle Backend-Specific Normalization (Shorthands)
if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var:
return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve explicit -mcpu and other flags when normalizing Hexagon strings.

This currently rewrites any string containing "hexagon" to a fixed llvm -mtriple=hexagon -mcpu=hexagonv73. Inputs like "llvm -mcpu=hexagonv75" lose the requested CPU/features and silently downgrade codegen.

Suggested fix
     # Handle Backend-Specific Normalization (Shorthands)
     if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var:
-        return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73"
+        normalized = return_var.strip()
+        if normalized == "hexagon":
+            return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73"
+        elif normalized.startswith("llvm"):
+            return_var = f"{normalized} -mtriple=hexagon"
+        else:
+            return_var = f"llvm -mtriple=hexagon {normalized}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Handle Backend-Specific Normalization (Shorthands)
if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var:
return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73"
# Handle Backend-Specific Normalization (Shorthands)
if isinstance(return_var, str) and "hexagon" in return_var.lower() and "-mtriple" not in return_var:
normalized = return_var.strip()
if normalized == "hexagon":
return_var = "llvm -mtriple=hexagon -mcpu=hexagonv73"
elif normalized.startswith("llvm"):
return_var = f"{normalized} -mtriple=hexagon"
else:
return_var = f"llvm -mtriple=hexagon {normalized}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tilelang/utils/target.py` around lines 189 - 191, The normalization currently
overwrites any Hexagon target string (variable return_var) with a fixed "llvm
-mtriple=hexagon -mcpu=hexagonv73"; instead, update the logic in
tilelang.utils.target (the block handling Hexagon shorthands) to parse the
existing return_var string and only inject missing flags: if "-mtriple" is
absent append "-mtriple=hexagon", if "-mcpu" is absent append a default
"-mcpu=hexagonv73", and preserve any existing "-mcpu", "-mattr", or other flags
already present in return_var; ensure you handle cases where return_var is
exactly "hexagon" (replace with "llvm" plus the necessary flags) versus when it
already starts with "llvm" (modify by appending missing flags) so no explicit
user flags are overwritten.

@LeiWang1999
Copy link
Copy Markdown
Member

@Calaweh Thanks for your contributions! We're currently decoupling different backends, and after those works are wrapped up, I will take a look for this pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Hexagon Backend with HMX Supports

2 participants