[Fix] Allow autotuning kernels with scalar value parameters#2136
[Fix] Allow autotuning kernels with scalar value parameters#2136yurekami wants to merge 1 commit intotile-ai:mainfrom
Conversation
Any kernel signature that includes a scalar value parameter, e.g.
def kernel(A: T.Tensor((N,), T.float32), s: T.float32):
...
cannot currently be autotuned. The autotuner asks the profiler to
generate inputs for every parameter via `Profiler._get_inputs`, which
calls `get_tensor_supply(...)(param)` for each. For the scalar `s`
that path landed in `tilelang/utils/tensor.py:get_tensor`, which
unconditionally raised on empty-shape `KernelParam`s with a misleading
"likely you are trying to generate a random tensor with a dynamic
symbolic shape" message - even though the actual cause is a scalar
value parameter, not a dynamic shape.
Teach `get_tensor` to recognize scalar params and return a Python
scalar of the matching dtype family (False for bool, 0.0 for floats,
0 for ints) so the autotuner can invoke the kernel during benchmarking.
Users that need a specific scalar value can still override per-kernel
via `supply_prog`.
The scalar fast path runs before `get_current_device()`, so this also
makes input generation work on CPU-only hosts when the kernel only
takes scalar params.
Adds a CUDA-free unit test parameterised over every `TensorSupplyType`
and the common dtype families to lock in the regression.
Closes: tile-ai#2081
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughThe pull request adds scalar parameter support to ChangesScalar Parameter Support
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. 👉 Get your free trial and get 200 agent minutes per Slack user (a $50 value). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tilelang/utils/tensor.py (2)
46-46: ⚡ Quick winUpdate the return type annotation to reflect the scalar path.
get_tensornow returnsbool | float | intfor scalar params, but the annotation still declares-> torch.Tensor. Static analysis tools and IDE type checkers will flag call sites that receive and use these values.♻️ Suggested annotation fix
- def get_tensor(param: KernelParam) -> torch.Tensor: + def get_tensor(param: KernelParam) -> "torch.Tensor | bool | float | int":🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tilelang/utils/tensor.py` at line 46, The return type annotation for get_tensor is outdated: it sometimes returns scalars (bool | float | int) for scalar KernelParam but is still annotated as -> torch.Tensor; update the annotation on get_tensor to reflect both tensor and scalar returns (e.g., Union[torch.Tensor, bool, float, int] or a named alias like Scalar = bool | float | int then -> Union[torch.Tensor, Scalar]) so static analyzers and IDEs recognize scalar return values; reference the function name get_tensor and the KernelParam type when locating the change.
55-58: 💤 Low value
hasattrguards are unnecessary — simplify the scalar check.
KernelParamis a dataclass that always has ashapefield and anis_boolean()method (used unconditionally by the tensor path at lines 75, 99). The doublehasattrcalls add noise without guarding against any real runtime scenario.♻️ Suggested simplification
- if hasattr(param, "shape") and not param.shape: - if hasattr(param, "is_boolean") and param.is_boolean(): + if not param.shape: + if param.is_boolean(): return False return 0.0 if dtype.is_floating_point else 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tilelang/utils/tensor.py` around lines 55 - 58, Remove the redundant hasattr guards around param.shape and param.is_boolean and simplify the scalar check in the tensor path: treat param as a KernelParam with a concrete shape and call param.is_boolean() directly; return False for boolean scalars and return 0.0 if dtype.is_floating_point else 0 for numeric scalars. Update the block referencing param.shape and param.is_boolean() (in the same function handling KernelParam) to remove the two hasattr(...) checks and use the direct property/method access instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tilelang/utils/tensor.py`:
- Line 46: The return type annotation for get_tensor is outdated: it sometimes
returns scalars (bool | float | int) for scalar KernelParam but is still
annotated as -> torch.Tensor; update the annotation on get_tensor to reflect
both tensor and scalar returns (e.g., Union[torch.Tensor, bool, float, int] or a
named alias like Scalar = bool | float | int then -> Union[torch.Tensor,
Scalar]) so static analyzers and IDEs recognize scalar return values; reference
the function name get_tensor and the KernelParam type when locating the change.
- Around line 55-58: Remove the redundant hasattr guards around param.shape and
param.is_boolean and simplify the scalar check in the tensor path: treat param
as a KernelParam with a concrete shape and call param.is_boolean() directly;
return False for boolean scalars and return 0.0 if dtype.is_floating_point else
0 for numeric scalars. Update the block referencing param.shape and
param.is_boolean() (in the same function handling KernelParam) to remove the two
hasattr(...) checks and use the direct property/method access instead.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 170a6e5b-854e-4f21-918a-3baef37fc3b9
📒 Files selected for processing (2)
testing/python/utils/test_tensor_supply_scalar.pytilelang/utils/tensor.py
|
@regression-perf |
|
Hi @yurekami, thanks for your contribution! However, I think we should have a discussion here to fully figure out the problem. The key point is that, if a scalar parameter is given, TileLang compiler has no knowledge of its usage in the kernel. What if it acts as sth like an Index or a threshold? As a result, its specific value may affect the workload or even legality of the program (e.g. the number of elements to be processed is relevent to the threshold, or the wrong index results in illegal memory access). Therefore, my idea is that to explicitly reject generating default value for scalar params (see #2084). And if users wanna tune a kernel with scalar param, they are required to provide the value of the scalar param via customized supply program. Do you agree with this solution? Also welcome for discussion or better ideas! |
|
I think the user is responsible for providing default values for the scalar value parameters during autotuning. Even if it serves as an index or a threshold, the user should have the right to specify the most probable path for subsequent autotuning. For example, if an |
|
tilelang.autotune is not necessarily used as a decorator. also i remember there’s a param called supply_fn in tilelang.autotune (not sure). Also glad to here if you have a better solution :D
…---- Replied Message ----
| From | ***@***.***> |
| Date | 05/08/2026 00:49 |
| To | tile-ai/tilelang ***@***.***> |
| Cc | Tong ***@***.***>,
Review ***@***.***> |
| Subject | Re: [tile-ai/tilelang] [Fix] Allow autotuning kernels with scalar value parameters (PR #2136) |
Triang-jyed-driung left a comment (tile-ai/tilelang#2136)
I think the user is responsible for providing default values for the scalar value parameters during autotuning. Even if it serves as an index or a threshold, the user should have the right to specify the most probable path for subsequent autotuning. For example, if an a: int32 is used in the denominator, the user could specify a as 1 during autotuning, leaving the a==0 branch untouched, since 0 is rarely encountered.
While #2081 shows the solution, it is not elegant. Parameters need to be specified at both @tilelang.autotune (next to the kernel definition) AND set_autotune_inputs (next to the kernel usage). Often, they are miles apart from each other, making the code harder to maintain.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because your review was requested.Message ID: ***@***.***>
|
Summary
Fixes #2081. Any kernel signature that includes a scalar value parameter, e.g.
cannot currently be autotuned.
Root cause
The autotuner asks the profiler to generate inputs for every parameter via
Profiler._get_inputs, which callsget_tensor_supply(...)(param)for each one. For the scalars, that path landed intilelang/utils/tensor.py:get_tensor, which unconditionally raised on empty-shapeKernelParams:The error message was also misleading — the actual cause is a scalar value parameter, not a dynamic shape (which is detected separately by the
tir.Varcheck below).Fix
Teach
get_tensorto recognize scalar params and return a Python scalar of the matching dtype family (Falsefor bool,0.0for floats,0for ints) so the autotuner can invoke the kernel during benchmarking. Users that need a specific scalar value can still override per-kernel viasupply_prog.The scalar fast path runs before
get_current_device(), so this also makes input generation work on CPU-only hosts when the kernel only takes scalar params.Test plan
testing/python/utils/test_tensor_supply_scalar.py— CUDA-free unit test parameterised over everyTensorSupplyTypeand the common dtype families (float32/16/64,bfloat16,int8/32/64,uint8,bool). Locks in that scalar params yield Python scalars of the right type for every supply variant.ruff check+ruff format --checkclean on both files.do_benchend-to-end).Tradeoffs / notes
0/0.0/False. This is a deliberate, deterministic choice — randomized scalars would surprise users whose kernels branch on the value. Anyone needing specific scalars should usesupply_prog, which already takes precedence over_get_inputs.Closes: #2081
Summary by CodeRabbit
Bug Fixes
Tests