[Refactor][Build] Separate CMakeLists into different backends by SiriusNEO · Pull Request #2114 · tile-ai/tilelang

SiriusNEO · 2026-04-28T12:20:22Z

Summary by CodeRabbit

Chores
- Reorganized build system to delegate backend selection and configuration to backend-specific CMake files (CUDA, ROCm, Metal) instead of centralized logic, improving modularity and maintainability.

github-actions · 2026-04-28T12:20:32Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-04-28T12:20:36Z

📝 Walkthrough

Walkthrough

Refactors CMake build configuration by moving backend-specific logic (CUDA, ROCm, Metal) from the top-level CMakeLists.txt into dedicated backend subdirectories. Replaces hardcoded CUDA/HIP conditionals with generalized backend-supplied variables for stub linking, RPATH configuration, and patchelf operations.

Changes

Cohort / File(s)	Summary
Top-level Build Refactoring `CMakeLists.txt`	Removes centralized CUDA/ROCm/Metal backend logic (~225 lines), replaces with delegation to backend-local CMake includes. Generalizes stub linking to use `TILELANG_ACTIVE_BACKEND_STUB_LINK`/`TILELANG_ACTIVE_BACKEND_STUB_TARGETS` variables, RPATH adjustment to `TILELANG_ACTIVE_BACKEND_RPATH_EXTRA`, and patchelf removal to configurable `TILELANG_ACTIVE_BACKEND_PATCHELF_REMOVE` list.
CUDA Backend Configuration `src/backend/cuda/CMakeLists.txt`	New backend CMake file that configures CUDA toolkit, builds three stub libraries (`cuda_stub`, `cudart_stub`, `nvrtc_stub`) when `TILELANG_USE_CUDA_STUBS` is enabled, forces TVM library selections to stubs, registers CUDA runtime/codegen sources, and configures UNIX RPATH plus patchelf SONAME removal for wheel portability.
ROCm Backend Configuration `src/backend/rocm/CMakeLists.txt`	New backend CMake file that initializes ROCm/HIP tooling, builds two stub libraries (`hip_stub`, `hiprtc_stub`) when `TILELANG_USE_HIP_STUBS` is enabled, overrides TVM's cached ROCm library selection to use stubs, registers HIP sources and include directories, and sets backend stub/link variables.
Metal Backend Configuration `src/backend/metal/CMakeLists.txt`	New backend CMake file that exits early when `USE_METAL` is not set, forces codegen-only mode on non-Apple platforms, globs Metal runtime source (`src/target/rt_mod_metal.cc`), and disables backtrace-related CI failures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[Codegen] Metal codegen on Linux #1857: Implements the same Metal backend CMake non-Apple codegen-only mode guard.
[Dist] Add lazy-loading stubs for CUDART + NVRTC (CUDA 11/12/13 compatible wheels) #1821: Directly related CUDA stub build and TVM library wiring configuration changes.
feat: add ROCm/HIP stub libraries for lazy loading (mirrors CUDA stubs) #1867: Directly overlaps with ROCm HIP stub build and backend stub linking variable implementation.

Suggested reviewers

LeiWang1999

Poem

🐰 Backends once lived in chaos deep,
Mixed with the main CMakeLists sheet,
Now each has its home, organized and neat,
CUDA, ROCm, Metal—each one's complete! 🎉
Build configuration hops to the right place,
Refactoring brings order to this codebase.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main refactoring objective: separating backend-specific build configuration from the top-level CMakeLists.txt into dedicated backend-local files (CUDA, ROCm, Metal), which aligns with the core changes across all modified files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CMakeLists.txt`:
- Around line 227-231: The backend include block currently lets per-backend
CMakeLists set scalar variables like TILELANG_ACTIVE_BACKEND_* which overwrite
each other; change the contract to accumulate state: in each backend CMakeLists
(cuda/rocm/metal) replace set(...) of the TILELANG_ACTIVE_BACKEND_* variables
with list(APPEND TILELANG_ACTIVE_BACKEND_<X> ...) so each backend appends its
values, then in the top-level include site iterate the collected lists (for
example using foreach or list(GET)/string(JOIN) as needed) to
consume/emit/link/install all entries instead of reading a single scalar; apply
the same change pattern for the other two include blocks mentioned (the blocks
around the 336-345 and 360-383 regions) so RPATH/patchelf and stub lists are
aggregated rather than overwritten.

In `@src/backend/cuda/CMakeLists.txt`:
- Around line 124-127: The RPATH variable TILELANG_ACTIVE_BACKEND_RPATH_EXTRA is
being set under if(UNIX) (which includes macOS) but is only used for non-Apple
UNIX in the top-level append, so change the guard around setting
TILELANG_ACTIVE_BACKEND_RPATH_EXTRA to only run on Linux by making it if(UNIX
AND NOT APPLE); update the condition that currently wraps the set(...) that
references CUDAToolkit_VERSION_MAJOR so the variable is only created on
non-Apple UNIX systems.

In `@src/backend/metal/CMakeLists.txt`:
- Around line 6-16: The CMake logic incorrectly disables codegen by calling
set(USE_METAL OFF) on non-Apple platforms; remove the set(USE_METAL OFF) line
and instead prevent only the Metal runtime wiring from being built on non-Apple
(e.g., skip adding rt_mod_metal.cc into TILE_LANG_METAL_SRCS or gate its
inclusion behind an APPLE check). Update the block around USE_METAL and the
list(APPEND TILE_LANG_SRCS ${TILE_LANG_METAL_SRCS}) so codegen
(tvm_callback_metal_compile) remains enabled while the platform-specific runtime
source src/target/rt_mod_metal.cc is excluded on non-Apple.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b8731fa8-cb68-45fe-8fb0-af611ce20e80

📥 Commits

Reviewing files that changed from the base of the PR and between 5d09b5d and dfc7ba4.

📒 Files selected for processing (4)

CMakeLists.txt
src/backend/cuda/CMakeLists.txt
src/backend/metal/CMakeLists.txt
src/backend/rocm/CMakeLists.txt

petersktang · 2026-04-29T03:48:35Z

Encounter build error

On Ubuntu 24, Linux 6.17.0-1017-oem with ROCm 7.2.2 (AMD Ryzen™ AI 9 HX 370, GPU 890M, no Nvidia GPU)

sudo apt install cuda-drivers, conda install cuda-toolkit==13.1.1 (python=3.14.3)
The CUDA compiler identification is NVIDIA 13.1.115 with host compiler GNU 14.3.0
USE_CUDA=ON, USE_ROCM=ON, USE_LLVM=ON, TILELANG_USE_CUDA_STUBS=ON pip install -e .


      CMake Error in CMakeLists.txt:
        IMPORTED_LOCATION not set for imported target "CUDA::cuda_driver"
        configuration "Release".
      CMake Error in CMakeLists.txt:
        IMPORTED_LOCATION not set for imported target "CUDA::cuda_driver"
        configuration "Release".
      CMake Error in CMakeLists.txt:
        IMPORTED_LOCATION not set for imported target "CUDA::nvml" configuration
        "Release".
      CMake Error in CMakeLists.txt:
        IMPORTED_LOCATION not set for imported target "CUDA::nvml" configuration
        "Release". 
  ld.lld: error: undefined symbol: cuDeviceGetName
  >>> referenced by cuda_device_api.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_device_api.cc.o:(tvm::runtime::CUDADeviceAPI::GetAttr(DLDevice, tvm::runtime::DeviceAttrKind, tvm::ffi::Any*))
  
  ld.lld: error: undefined symbol: cuGetErrorName
  >>> referenced by cuda_device_api.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_device_api.cc.o:(tvm::runtime::CUDADeviceAPI::GetAttr(DLDevice, tvm::runtime::DeviceAttrKind, tvm::ffi::Any*))
  >>> referenced by cuda_module.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_module.cc.o:(tvm::runtime::CUDAPrepGlobalBarrier::operator()(tvm::ffi::PackedArgs const&, tvm::ffi::Any*) const)
  >>> referenced by cuda_module.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_module.cc.o:(tvm::runtime::CUDAModuleNode::GetGlobal(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, unsigned long))
  >>> referenced 7 more times
  
  ld.lld: error: undefined symbol: cuMemsetD32_v2
  >>> referenced by cuda_module.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_module.cc.o:(tvm::runtime::CUDAPrepGlobalBarrier::operator()(tvm::ffi::PackedArgs const&, tvm::ffi::Any*) const)
  
  ld.lld: error: undefined symbol: cuModuleLoadData
  >>> referenced by cuda_module.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_module.cc.o:(tvm::runtime::CUDAModuleNode::GetGlobal(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, unsigned long))
  >>> referenced by cuda_module.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_module.cc.o:(tvm::runtime::CUDAModuleNode::GetFunc(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&))
  
  ld.lld: error: undefined symbol: cuModuleGetGlobal_v2
  >>> referenced by cuda_module.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_module.cc.o:(tvm::runtime::CUDAModuleNode::GetGlobal(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, unsigned long))

SiriusNEO · 2026-04-29T10:05:16Z

Hi @petersktang, I haven't tested the case where all backend options are enabled. Why might this situation occur? Under normal reasoning, we would generally expect only one backend to be active at a time.

petersktang · 2026-04-29T10:25:06Z

Hi @petersktang, I haven't tested the case where all backend options are enabled. Why might this situation occur? Under normal reasoning, we would generally expect only one backend to be active at a time.

I am using this capability to cross generate/compile code for both cuda and rocm with the same build. This was done using a local modified copy of TileLang. Now, with the separation of CMakeLists per backend, this feature will be even more interesting. A low end computer can become part of the workflow to generate code and deploy to run on a number of remote high end computers and GPUs. BTW, TileLang code can also be developed & tested first on the low end machine with whatever cpu/gpu available before pushing to run on the high end CPU/GPU cluster, and there can be many separate clusters located globally, on nationwide edge servers and on a fleet of AI enabled drones, each with different hardware builds/generations.

petersktang · 2026-04-30T15:19:49Z

The fix in CMakeLists.txt for USE_CUDA=ON, USE_ROCM=ON, USE_LLVM=ON


target_include_directories(tilelang_objs PRIVATE ${TILE_LANG_INCLUDES})
target_compile_definitions(tilelang_objs PRIVATE TVM_LOG_CUSTOMIZE=1)
if(TILELANG_RELEASE_BUILD)
  target_compile_definitions(tilelang_objs PRIVATE TILELANG_RELEASE_BUILD=1)
endif()

after line# 282 add the below:


if(USE_CUDA)
  if(TILELANG_USE_CUDA_STUBS)
    target_link_libraries(tvm PRIVATE cuda_stub cudart_stub nvrtc_stub)
  else()
    # find_package(CUDAToolkit REQUIRED)
    unset(TILELANG_ACTIVE_BACKEND_PATCHELF_REMOVE)
    target_link_libraries(tvm PRIVATE cuda)
  endif()
endif()

SiriusNEO · 2026-05-08T07:13:12Z

Hi @petersktang, it seems that this problem is fixed in latest main branch. Could you please help me verify this?

petersktang · 2026-05-08T14:52:35Z

Hi @petersktang, it seems that this problem is fixed in latest main branch. Could you please help me verify this?

seems resolved. Though I have doubt on src/backend/rocm/CMakeLists.txt, whether

if(TILELANG_USE_HIP_STUBS)
  set(TILELANG_ACTIVE_BACKEND_STUB_LINK hip_stub)
  set(TILELANG_ACTIVE_BACKEND_STUB_TARGETS hip_stub hiprtc_stub)
endif()

should be

if(TILELANG_USE_HIP_STUBS)
  list(APPEND TILELANG_ACTIVE_BACKEND_STUB_LINK hip_stub)
  list(APPEND TILELANG_ACTIVE_BACKEND_STUB_TARGETS hip_stub hiprtc_stub)
endif()

and src/backend/cuda/CMakeLists.txt, whether

if(TILELANG_USE_CUDA_STUBS)
  set(TILELANG_ACTIVE_BACKEND_STUB_LINK cuda_stub)
  set(TILELANG_ACTIVE_BACKEND_STUB_TARGETS cuda_stub cudart_stub nvrtc_stub)
endif()

should be

if(TILELANG_USE_CUDA_STUBS)
  list(APPEND TILELANG_ACTIVE_BACKEND_STUB_LINK cuda_stub)
  list(APPEND TILELANG_ACTIVE_BACKEND_STUB_TARGETS cuda_stub cudart_stub nvrtc_stub)
endif()

[Refactor][Build] Separate CMakeLists into different backends

dfc7ba4

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread CMakeLists.txt

Comment thread src/backend/cuda/CMakeLists.txt

Comment thread src/backend/metal/CMakeLists.txt

SiriusNEO mentioned this pull request Apr 28, 2026

[RoadMap] Multi-Backend Refactoring/Decoupling #2115

Open

6 tasks

SiriusNEO merged commit b0bec1f into tile-ai:main Apr 28, 2026
12 of 13 checks passed

coderabbitai Bot mentioned this pull request Apr 29, 2026

[Refactor][CodeGen] Refactor CodeGen part for multi-backend decoupling #2121

Merged

coderabbitai Bot mentioned this pull request May 7, 2026

[Refactor] Move backend stubs out of codegen #2164

Merged

SiriusNEO mentioned this pull request May 11, 2026

[Build] Fix cross platform CMake and add messages when enabling backends #2183

Merged

coderabbitai Bot mentioned this pull request May 13, 2026

[BugFix] Vendor HIP headers and build fat CUDA+ROCm linux wheels #2195

Merged

4 tasks

Conversation

SiriusNEO commented Apr 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

petersktang commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Encounter build error

Uh oh!

SiriusNEO commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petersktang commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petersktang commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The fix in CMakeLists.txt for USE_CUDA=ON, USE_ROCM=ON, USE_LLVM=ON

Uh oh!

SiriusNEO commented May 8, 2026

Uh oh!

petersktang commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SiriusNEO commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

petersktang commented Apr 29, 2026 •

edited

Loading

SiriusNEO commented Apr 29, 2026 •

edited

Loading

petersktang commented Apr 29, 2026 •

edited

Loading

petersktang commented Apr 30, 2026 •

edited

Loading