Skip to content

[Fix] Allow autotuning kernels with scalar value parameters#2136

Open
yurekami wants to merge 1 commit intotile-ai:mainfrom
yurekami:fix-autotune-scalar-value-params
Open

[Fix] Allow autotuning kernels with scalar value parameters#2136
yurekami wants to merge 1 commit intotile-ai:mainfrom
yurekami:fix-autotune-scalar-value-params

Conversation

@yurekami
Copy link
Copy Markdown
Contributor

@yurekami yurekami commented May 3, 2026

Summary

Fixes #2081. Any kernel signature that includes a scalar value parameter, e.g.

@tilelang.autotune(configs=[{"threads": 128}, {"threads": 256}], ...)
@tilelang.jit
def test_fun(N=4096, BLOCK_N=512, threads=128):
    @T.prim_func
    def kernel(A: T.Tensor((N,), T.float32), s: T.float32):  # <- scalar value param
        ...
    return kernel

test_fun()(A, 0.1)  # crashes during autotune

cannot currently be autotuned.

Root cause

The autotuner asks the profiler to generate inputs for every parameter via Profiler._get_inputs, which calls get_tensor_supply(...)(param) for each one. For the scalar s, that path landed in tilelang/utils/tensor.py:get_tensor, which unconditionally raised on empty-shape KernelParams:

if hasattr(param, "shape") and not param.shape:
    raise ValueError(
        f"TensorType must have a shape, but got {type(param)}, "
        "likely you are trying to generate a random tensor with a dynamic symbolic shape."
    )

The error message was also misleading — the actual cause is a scalar value parameter, not a dynamic shape (which is detected separately by the tir.Var check below).

Fix

Teach get_tensor to recognize scalar params and return a Python scalar of the matching dtype family (False for bool, 0.0 for floats, 0 for ints) so the autotuner can invoke the kernel during benchmarking. Users that need a specific scalar value can still override per-kernel via supply_prog.

The scalar fast path runs before get_current_device(), so this also makes input generation work on CPU-only hosts when the kernel only takes scalar params.

Test plan

  • Added testing/python/utils/test_tensor_supply_scalar.py — CUDA-free unit test parameterised over every TensorSupplyType and the common dtype families (float32/16/64, bfloat16, int8/32/64, uint8, bool). Locks in that scalar params yield Python scalars of the right type for every supply variant.
  • ruff check + ruff format --check clean on both files.
  • Maintainer to confirm autotune now succeeds on the repro from [BUG] Autotuning + kernel value parameters = failure #2081 (I do not have a CUDA box to run do_bench end-to-end).

Tradeoffs / notes

  • The default scalar value is 0 / 0.0 / False. This is a deliberate, deterministic choice — randomized scalars would surprise users whose kernels branch on the value. Anyone needing specific scalars should use supply_prog, which already takes precedence over _get_inputs.
  • The previous error path is removed because it was unreachable for legitimate uses (scalar params now succeed) and its message was actively misleading. The dynamic-shape error path remains untouched.

Closes: #2081

Summary by CodeRabbit

  • Bug Fixes

    • Scalar (zero-dimensional) parameters are now properly handled, returning appropriate Python scalar values instead of errors. This enables the autotuner to work with scalar arguments.
  • Tests

    • Added regression tests validating scalar parameter support across multiple data types and tensor supply modes.

Any kernel signature that includes a scalar value parameter, e.g.

    def kernel(A: T.Tensor((N,), T.float32), s: T.float32):
        ...

cannot currently be autotuned. The autotuner asks the profiler to
generate inputs for every parameter via `Profiler._get_inputs`, which
calls `get_tensor_supply(...)(param)` for each. For the scalar `s`
that path landed in `tilelang/utils/tensor.py:get_tensor`, which
unconditionally raised on empty-shape `KernelParam`s with a misleading
"likely you are trying to generate a random tensor with a dynamic
symbolic shape" message - even though the actual cause is a scalar
value parameter, not a dynamic shape.

Teach `get_tensor` to recognize scalar params and return a Python
scalar of the matching dtype family (False for bool, 0.0 for floats,
0 for ints) so the autotuner can invoke the kernel during benchmarking.
Users that need a specific scalar value can still override per-kernel
via `supply_prog`.

The scalar fast path runs before `get_current_device()`, so this also
makes input generation work on CPU-only hosts when the kernel only
takes scalar params.

Adds a CUDA-free unit test parameterised over every `TensorSupplyType`
and the common dtype families to lock in the regression.

Closes: tile-ai#2081
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

📝 Walkthrough

Walkthrough

The pull request adds scalar parameter support to get_tensor_supply() by returning Python scalar values for empty-shaped KernelParam objects instead of raising an error, enabling autotuner invocation with scalar arguments. Comprehensive regression tests validate the change across all TensorSupplyType modes and dtype cases.

Changes

Scalar Parameter Support

Layer / File(s) Summary
Core Implementation
tilelang/utils/tensor.py
get_tensor() now detects empty-shaped parameters and returns dtype-appropriate Python scalars (False for boolean, 0.0 for float, 0 for others) instead of raising ValueError.
Regression Tests
testing/python/utils/test_tensor_supply_scalar.py
Parametrized test cases validate scalar return values across all TensorSupplyType modes and dtype strings; additional CPU-only test verifies the scalar fast path does not trigger CUDA dependencies.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A scalar hops into the tuner's embrace,
No longer lost in tensor space,
With Python values, swift and true,
Autotuning dreams come into view!
*—Cottontail, Chief of Code Review 🎯✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[Fix] Allow autotuning kernels with scalar value parameters' clearly and accurately describes the primary change: enabling autotuning for kernels with scalar value parameters.
Linked Issues check ✅ Passed The pull request fully addresses issue #2081 by modifying get_tensor() to recognize and handle scalar parameters, returning appropriate Python scalars instead of raising an error.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing scalar parameter handling in autotuning. The modifications to tensor.py and new test module are focused on the stated objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get your free trial and get 200 agent minutes per Slack user (a $50 value).


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tilelang/utils/tensor.py (2)

46-46: ⚡ Quick win

Update the return type annotation to reflect the scalar path.

get_tensor now returns bool | float | int for scalar params, but the annotation still declares -> torch.Tensor. Static analysis tools and IDE type checkers will flag call sites that receive and use these values.

♻️ Suggested annotation fix
-    def get_tensor(param: KernelParam) -> torch.Tensor:
+    def get_tensor(param: KernelParam) -> "torch.Tensor | bool | float | int":
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tilelang/utils/tensor.py` at line 46, The return type annotation for
get_tensor is outdated: it sometimes returns scalars (bool | float | int) for
scalar KernelParam but is still annotated as -> torch.Tensor; update the
annotation on get_tensor to reflect both tensor and scalar returns (e.g.,
Union[torch.Tensor, bool, float, int] or a named alias like Scalar = bool |
float | int then -> Union[torch.Tensor, Scalar]) so static analyzers and IDEs
recognize scalar return values; reference the function name get_tensor and the
KernelParam type when locating the change.

55-58: 💤 Low value

hasattr guards are unnecessary — simplify the scalar check.

KernelParam is a dataclass that always has a shape field and an is_boolean() method (used unconditionally by the tensor path at lines 75, 99). The double hasattr calls add noise without guarding against any real runtime scenario.

♻️ Suggested simplification
-        if hasattr(param, "shape") and not param.shape:
-            if hasattr(param, "is_boolean") and param.is_boolean():
+        if not param.shape:
+            if param.is_boolean():
                 return False
             return 0.0 if dtype.is_floating_point else 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tilelang/utils/tensor.py` around lines 55 - 58, Remove the redundant hasattr
guards around param.shape and param.is_boolean and simplify the scalar check in
the tensor path: treat param as a KernelParam with a concrete shape and call
param.is_boolean() directly; return False for boolean scalars and return 0.0 if
dtype.is_floating_point else 0 for numeric scalars. Update the block referencing
param.shape and param.is_boolean() (in the same function handling KernelParam)
to remove the two hasattr(...) checks and use the direct property/method access
instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tilelang/utils/tensor.py`:
- Line 46: The return type annotation for get_tensor is outdated: it sometimes
returns scalars (bool | float | int) for scalar KernelParam but is still
annotated as -> torch.Tensor; update the annotation on get_tensor to reflect
both tensor and scalar returns (e.g., Union[torch.Tensor, bool, float, int] or a
named alias like Scalar = bool | float | int then -> Union[torch.Tensor,
Scalar]) so static analyzers and IDEs recognize scalar return values; reference
the function name get_tensor and the KernelParam type when locating the change.
- Around line 55-58: Remove the redundant hasattr guards around param.shape and
param.is_boolean and simplify the scalar check in the tensor path: treat param
as a KernelParam with a concrete shape and call param.is_boolean() directly;
return False for boolean scalars and return 0.0 if dtype.is_floating_point else
0 for numeric scalars. Update the block referencing param.shape and
param.is_boolean() (in the same function handling KernelParam) to remove the two
hasattr(...) checks and use the direct property/method access instead.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 170a6e5b-854e-4f21-918a-3baef37fc3b9

📥 Commits

Reviewing files that changed from the base of the PR and between 2eec5f0 and ad0eb18.

📒 Files selected for processing (2)
  • testing/python/utils/test_tensor_supply_scalar.py
  • tilelang/utils/tensor.py

@LeiWang1999
Copy link
Copy Markdown
Member

@regression-perf

@LeiWang1999 LeiWang1999 requested a review from Rachmanino May 6, 2026 03:39
@Rachmanino
Copy link
Copy Markdown
Collaborator

Hi @yurekami, thanks for your contribution! However, I think we should have a discussion here to fully figure out the problem.

The key point is that, if a scalar parameter is given, TileLang compiler has no knowledge of its usage in the kernel. What if it acts as sth like an Index or a threshold? As a result, its specific value may affect the workload or even legality of the program (e.g. the number of elements to be processed is relevent to the threshold, or the wrong index results in illegal memory access). Therefore, my idea is that to explicitly reject generating default value for scalar params (see #2084). And if users wanna tune a kernel with scalar param, they are required to provide the value of the scalar param via customized supply program.

Do you agree with this solution? Also welcome for discussion or better ideas!

@Triang-jyed-driung
Copy link
Copy Markdown

I think the user is responsible for providing default values for the scalar value parameters during autotuning. Even if it serves as an index or a threshold, the user should have the right to specify the most probable path for subsequent autotuning. For example, if an a: int32 is used in the denominator, the user could specify a as 1 during autotuning, leaving the a==0 branch untouched, since 0 is rarely encountered.
While #2081 shows the solution, it is not elegant. Parameters need to be specified at both @tilelang.autotune (next to the kernel definition) AND set_autotune_inputs (next to the kernel usage). Often, they are miles apart from each other, making the code harder to maintain.

@Rachmanino
Copy link
Copy Markdown
Collaborator

Rachmanino commented May 7, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Autotuning + kernel value parameters = failure

4 participants