Skip to content

[Refactor][Build] Separate CMakeLists into different backends#2114

Merged
SiriusNEO merged 1 commit into
tile-ai:mainfrom
SiriusNEO:chaofan/refactor_cmake_0428
Apr 28, 2026
Merged

[Refactor][Build] Separate CMakeLists into different backends#2114
SiriusNEO merged 1 commit into
tile-ai:mainfrom
SiriusNEO:chaofan/refactor_cmake_0428

Conversation

@SiriusNEO
Copy link
Copy Markdown
Collaborator

@SiriusNEO SiriusNEO commented Apr 28, 2026

Summary by CodeRabbit

  • Chores
    • Reorganized build system to delegate backend selection and configuration to backend-specific CMake files (CUDA, ROCm, Metal) instead of centralized logic, improving modularity and maintainability.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

📝 Walkthrough

Walkthrough

Refactors CMake build configuration by moving backend-specific logic (CUDA, ROCm, Metal) from the top-level CMakeLists.txt into dedicated backend subdirectories. Replaces hardcoded CUDA/HIP conditionals with generalized backend-supplied variables for stub linking, RPATH configuration, and patchelf operations.

Changes

Cohort / File(s) Summary
Top-level Build Refactoring
CMakeLists.txt
Removes centralized CUDA/ROCm/Metal backend logic (~225 lines), replaces with delegation to backend-local CMake includes. Generalizes stub linking to use TILELANG_ACTIVE_BACKEND_STUB_LINK/TILELANG_ACTIVE_BACKEND_STUB_TARGETS variables, RPATH adjustment to TILELANG_ACTIVE_BACKEND_RPATH_EXTRA, and patchelf removal to configurable TILELANG_ACTIVE_BACKEND_PATCHELF_REMOVE list.
CUDA Backend Configuration
src/backend/cuda/CMakeLists.txt
New backend CMake file that configures CUDA toolkit, builds three stub libraries (cuda_stub, cudart_stub, nvrtc_stub) when TILELANG_USE_CUDA_STUBS is enabled, forces TVM library selections to stubs, registers CUDA runtime/codegen sources, and configures UNIX RPATH plus patchelf SONAME removal for wheel portability.
ROCm Backend Configuration
src/backend/rocm/CMakeLists.txt
New backend CMake file that initializes ROCm/HIP tooling, builds two stub libraries (hip_stub, hiprtc_stub) when TILELANG_USE_HIP_STUBS is enabled, overrides TVM's cached ROCm library selection to use stubs, registers HIP sources and include directories, and sets backend stub/link variables.
Metal Backend Configuration
src/backend/metal/CMakeLists.txt
New backend CMake file that exits early when USE_METAL is not set, forces codegen-only mode on non-Apple platforms, globs Metal runtime source (src/target/rt_mod_metal.cc), and disables backtrace-related CI failures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • LeiWang1999

Poem

🐰 Backends once lived in chaos deep,
Mixed with the main CMakeLists sheet,
Now each has its home, organized and neat,
CUDA, ROCm, Metal—each one's complete! 🎉
Build configuration hops to the right place,
Refactoring brings order to this codebase.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main refactoring objective: separating backend-specific build configuration from the top-level CMakeLists.txt into dedicated backend-local files (CUDA, ROCm, Metal), which aligns with the core changes across all modified files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CMakeLists.txt`:
- Around line 227-231: The backend include block currently lets per-backend
CMakeLists set scalar variables like TILELANG_ACTIVE_BACKEND_* which overwrite
each other; change the contract to accumulate state: in each backend CMakeLists
(cuda/rocm/metal) replace set(...) of the TILELANG_ACTIVE_BACKEND_* variables
with list(APPEND TILELANG_ACTIVE_BACKEND_<X> ...) so each backend appends its
values, then in the top-level include site iterate the collected lists (for
example using foreach or list(GET)/string(JOIN) as needed) to
consume/emit/link/install all entries instead of reading a single scalar; apply
the same change pattern for the other two include blocks mentioned (the blocks
around the 336-345 and 360-383 regions) so RPATH/patchelf and stub lists are
aggregated rather than overwritten.

In `@src/backend/cuda/CMakeLists.txt`:
- Around line 124-127: The RPATH variable TILELANG_ACTIVE_BACKEND_RPATH_EXTRA is
being set under if(UNIX) (which includes macOS) but is only used for non-Apple
UNIX in the top-level append, so change the guard around setting
TILELANG_ACTIVE_BACKEND_RPATH_EXTRA to only run on Linux by making it if(UNIX
AND NOT APPLE); update the condition that currently wraps the set(...) that
references CUDAToolkit_VERSION_MAJOR so the variable is only created on
non-Apple UNIX systems.

In `@src/backend/metal/CMakeLists.txt`:
- Around line 6-16: The CMake logic incorrectly disables codegen by calling
set(USE_METAL OFF) on non-Apple platforms; remove the set(USE_METAL OFF) line
and instead prevent only the Metal runtime wiring from being built on non-Apple
(e.g., skip adding rt_mod_metal.cc into TILE_LANG_METAL_SRCS or gate its
inclusion behind an APPLE check). Update the block around USE_METAL and the
list(APPEND TILE_LANG_SRCS ${TILE_LANG_METAL_SRCS}) so codegen
(tvm_callback_metal_compile) remains enabled while the platform-specific runtime
source src/target/rt_mod_metal.cc is excluded on non-Apple.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b8731fa8-cb68-45fe-8fb0-af611ce20e80

📥 Commits

Reviewing files that changed from the base of the PR and between 5d09b5d and dfc7ba4.

📒 Files selected for processing (4)
  • CMakeLists.txt
  • src/backend/cuda/CMakeLists.txt
  • src/backend/metal/CMakeLists.txt
  • src/backend/rocm/CMakeLists.txt

Comment thread CMakeLists.txt
Comment thread src/backend/cuda/CMakeLists.txt
Comment thread src/backend/metal/CMakeLists.txt
@SiriusNEO SiriusNEO merged commit b0bec1f into tile-ai:main Apr 28, 2026
12 of 13 checks passed
@petersktang
Copy link
Copy Markdown

petersktang commented Apr 29, 2026

Encounter build error

On Ubuntu 24, Linux 6.17.0-1017-oem with ROCm 7.2.2 (AMD Ryzen™ AI 9 HX 370, GPU 890M, no Nvidia GPU)

  • sudo apt install cuda-drivers, conda install cuda-toolkit==13.1.1 (python=3.14.3)
  • The CUDA compiler identification is NVIDIA 13.1.115 with host compiler GNU 14.3.0
  • USE_CUDA=ON, USE_ROCM=ON, USE_LLVM=ON, TILELANG_USE_CUDA_STUBS=ON pip install -e .
CMake Error in CMakeLists.txt: IMPORTED_LOCATION not set for imported target "CUDA::cuda_driver" configuration "Release". CMake Error in CMakeLists.txt: IMPORTED_LOCATION not set for imported target "CUDA::cuda_driver" configuration "Release". CMake Error in CMakeLists.txt: IMPORTED_LOCATION not set for imported target "CUDA::nvml" configuration "Release". CMake Error in CMakeLists.txt: IMPORTED_LOCATION not set for imported target "CUDA::nvml" configuration "Release".
  ld.lld: error: undefined symbol: cuDeviceGetName
  >>> referenced by cuda_device_api.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_device_api.cc.o:(tvm::runtime::CUDADeviceAPI::GetAttr(DLDevice, tvm::runtime::DeviceAttrKind, tvm::ffi::Any*))
  
  ld.lld: error: undefined symbol: cuGetErrorName
  >>> referenced by cuda_device_api.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_device_api.cc.o:(tvm::runtime::CUDADeviceAPI::GetAttr(DLDevice, tvm::runtime::DeviceAttrKind, tvm::ffi::Any*))
  >>> referenced by cuda_module.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_module.cc.o:(tvm::runtime::CUDAPrepGlobalBarrier::operator()(tvm::ffi::PackedArgs const&, tvm::ffi::Any*) const)
  >>> referenced by cuda_module.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_module.cc.o:(tvm::runtime::CUDAModuleNode::GetGlobal(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, unsigned long))
  >>> referenced 7 more times
  
  ld.lld: error: undefined symbol: cuMemsetD32_v2
  >>> referenced by cuda_module.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_module.cc.o:(tvm::runtime::CUDAPrepGlobalBarrier::operator()(tvm::ffi::PackedArgs const&, tvm::ffi::Any*) const)
  
  ld.lld: error: undefined symbol: cuModuleLoadData
  >>> referenced by cuda_module.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_module.cc.o:(tvm::runtime::CUDAModuleNode::GetGlobal(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, unsigned long))
  >>> referenced by cuda_module.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_module.cc.o:(tvm::runtime::CUDAModuleNode::GetFunc(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&))
  
  ld.lld: error: undefined symbol: cuModuleGetGlobal_v2
  >>> referenced by cuda_module.cc
  >>>               tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/cuda/cuda_module.cc.o:(tvm::runtime::CUDAModuleNode::GetGlobal(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, unsigned long))

@SiriusNEO
Copy link
Copy Markdown
Collaborator Author

SiriusNEO commented Apr 29, 2026

Hi @petersktang, I haven't tested the case where all backend options are enabled. Why might this situation occur? Under normal reasoning, we would generally expect only one backend to be active at a time.

@petersktang
Copy link
Copy Markdown

petersktang commented Apr 29, 2026

Hi @petersktang, I haven't tested the case where all backend options are enabled. Why might this situation occur? Under normal reasoning, we would generally expect only one backend to be active at a time.

I am using this capability to cross generate/compile code for both cuda and rocm with the same build. This was done using a local modified copy of TileLang. Now, with the separation of CMakeLists per backend, this feature will be even more interesting. A low end computer can become part of the workflow to generate code and deploy to run on a number of remote high end computers and GPUs. BTW, TileLang code can also be developed & tested first on the low end machine with whatever cpu/gpu available before pushing to run on the high end CPU/GPU cluster, and there can be many separate clusters located globally, on nationwide edge servers and on a fleet of AI enabled drones, each with different hardware builds/generations.

@petersktang
Copy link
Copy Markdown

petersktang commented Apr 30, 2026

The fix in CMakeLists.txt for USE_CUDA=ON, USE_ROCM=ON, USE_LLVM=ON

target_include_directories(tilelang_objs PRIVATE ${TILE_LANG_INCLUDES}) target_compile_definitions(tilelang_objs PRIVATE TVM_LOG_CUSTOMIZE=1) if(TILELANG_RELEASE_BUILD) target_compile_definitions(tilelang_objs PRIVATE TILELANG_RELEASE_BUILD=1) endif()
after line# 282 add the below:
if(USE_CUDA) if(TILELANG_USE_CUDA_STUBS) target_link_libraries(tvm PRIVATE cuda_stub cudart_stub nvrtc_stub) else() # find_package(CUDAToolkit REQUIRED) unset(TILELANG_ACTIVE_BACKEND_PATCHELF_REMOVE) target_link_libraries(tvm PRIVATE cuda) endif() endif()

@SiriusNEO
Copy link
Copy Markdown
Collaborator Author

Hi @petersktang, it seems that this problem is fixed in latest main branch. Could you please help me verify this?

@petersktang
Copy link
Copy Markdown

Hi @petersktang, it seems that this problem is fixed in latest main branch. Could you please help me verify this?

seems resolved. Though I have doubt on src/backend/rocm/CMakeLists.txt, whether

if(TILELANG_USE_HIP_STUBS)
  set(TILELANG_ACTIVE_BACKEND_STUB_LINK hip_stub)
  set(TILELANG_ACTIVE_BACKEND_STUB_TARGETS hip_stub hiprtc_stub)
endif()

should be

if(TILELANG_USE_HIP_STUBS)
  list(APPEND TILELANG_ACTIVE_BACKEND_STUB_LINK hip_stub)
  list(APPEND TILELANG_ACTIVE_BACKEND_STUB_TARGETS hip_stub hiprtc_stub)
endif()

and src/backend/cuda/CMakeLists.txt, whether

if(TILELANG_USE_CUDA_STUBS)
  set(TILELANG_ACTIVE_BACKEND_STUB_LINK cuda_stub)
  set(TILELANG_ACTIVE_BACKEND_STUB_TARGETS cuda_stub cudart_stub nvrtc_stub)
endif()

should be

if(TILELANG_USE_CUDA_STUBS)
  list(APPEND TILELANG_ACTIVE_BACKEND_STUB_LINK cuda_stub)
  list(APPEND TILELANG_ACTIVE_BACKEND_STUB_TARGETS cuda_stub cudart_stub nvrtc_stub)
endif()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants