[Enhance] Reject default scalar params and support `do_not_specialize` for autotune by Rachmanino · Pull Request #2084 · tile-ai/tilelang

Rachmanino · 2026-04-22T13:28:38Z

#2081

Summary by CodeRabbit

New Features
- Added a do_not_specialize option to exclude specified inputs from autotune cache-key specialization, enabling stable cache reuse when those inputs change.
- Autotune now validates required scalar inputs at runtime and surfaces a clear error if they are not explicitly supplied.
Tests
- Added tests ensuring scalar autotune inputs must be explicitly supplied and work correctly when provided.
- Added CUDA tests validating do_not_specialize preserves cache reuse across varied call patterns.

github-actions · 2026-04-22T13:28:48Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-04-22T13:28:50Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 302c38c3-4aaf-48d4-9fb1-678028f3fb7b

📥 Commits

Reviewing files that changed from the base of the PR and between 0219ee2 and da46def.

📒 Files selected for processing (1)

tilelang/autotuner/tuner.py

🚧 Files skipped from review as they are similar to previous changes (1)

tilelang/autotuner/tuner.py

📝 Walkthrough

Walkthrough

Defers validation of required scalar autotune inputs until autotune execution, adds a do_not_specialize option to exclude named arguments from cache-key computation, and adds CUDA tests validating explicit scalar-input supply and do_not_specialize cache-reuse behavior.

Changes

Autotuner core logic

Layer / File(s)	Summary
Validation addition `tilelang/autotuner/tuner.py`	Adds `AutoTuner._validate_input_supply_requirements(...)` to detect non-output `prim_func` params that require concrete scalar inputs and raise `ValueError` when `profile_args.supply_prog` is not provided.
Deferred validation & run ordering `tilelang/autotuner/tuner.py`	Defers scalar-input validation by setting `autotuner._prim_func_for_validation` in `AutoTuneImpl.__call__` and invoking validation inside `AutoTuner.run(...)` after the early-return path for already-supplied tunable args.
Cache-key / specialization control `tilelang/autotuner/tuner.py`	Adds `do_not_specialize` field to `AutoTuneImpl` and exposes `do_not_specialize` param on `autotune(...)`; cache-key normalization filters out argument names listed in `do_not_specialize` using signature binding + defaults.
Cleanup / wiring `tilelang/autotuner/tuner.py`	Removes redundant tuner assignments by reusing the created `autotuner` instance across branches and wires `do_not_specialize` through decorator construction.

Autotune tests (scalar inputs & do_not_specialize)

Layer / File(s)	Summary
New tests: scalar-input requirement `testing/python/autotune/test_tilelang_autotune_scalar_inputs.py`	Adds `add_scalar` autotuned kernel and two tests: one asserts a `ValueError` when scalar autotune inputs are not supplied; the other uses `set_autotune_inputs(tune_a, tune_s)` with CUDA tensors/values to run the kernel and verify results.
New tests: do_not_specialize behavior `testing/python/autotune/test_tilelang_autotune_do_not_specialize.py`	Adds `matmul_do_not_spec` autotuned kernel annotated with `do_not_specialize=["N","K"]`, helpers and autouse fixture to clear tuner cache, and multiple tests asserting cache reuse when only `do_not_specialize` params change and cache growth when non-exempt params change.
Test wiring / helpers `testing/python/autotune/...`	Introduces helper functions (`get_configs()`, `_call_and_count_new_entries`, cache-clear fixture) and uses `set_autotune_inputs(...)` in tests to supply concrete tuning inputs.
Entry / test harness `testing/python/autotune/test_tilelang_autotune_scalar_inputs.py`	Adds `__main__` entry calling `tilelang.testing.main()` for standalone execution.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant AutoTuneImpl
  participant JIT_TIR as JIT/TIR
  participant AutoTuner

  Caller->>AutoTuneImpl: invoke kernel(*args, **kwargs)
  AutoTuneImpl->>JIT_TIR: resolve prim_func and signature
  JIT_TIR-->>AutoTuneImpl: prim_func, signature
  AutoTuneImpl->>AutoTuneImpl: compute cache key (apply do_not_specialize filter)
  AutoTuneImpl->>AutoTuner: lookup cache for key
  alt cache miss
    AutoTuneImpl->>JIT_TIR: obtain prim_func for validation
    AutoTuneImpl->>AutoTuner: prepare profiling args / supply_prog
    AutoTuner->>AutoTuner: validate scalar-input requirements (may raise ValueError if needed)
    AutoTuner->>AutoTuner: run autotuning and select best config
    AutoTuner-->>AutoTuneImpl: compiled kernel
  end
  AutoTuneImpl-->>Caller: execute compiled kernel

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

tile-ai/tilelang#1906: Modifies tilelang/autotuner/tuner.py and touches cache-key computation / eager-mode compile flows; closely related to the do_not_specialize and normalization changes in this PR.

Poem

🐇 I hopped through kernels, scalars in my paw,
Told inputs, "Show up" — no spooky gap to claw.
Keys trimmed softly, caches keep their tune,
Tuners hum at dusk and patch the noon.
A little hop, and tests return at noon.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly summarizes the main changes: rejecting default scalar parameters and adding `do_not_specialize` support for autotune, which align with the substantial code changes across three files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@testing/python/autotune/test_tilelang_autotune_scalar_inputs.py`:
- Around line 29-39: The test
test_autotune_scalar_inputs_with_set_autotune_inputs allocates CUDA tensors via
set_autotune_inputs and local tensors (add_scalar), so add the repository's
standard CUDA guard to the test (e.g., decorate
test_autotune_scalar_inputs_with_set_autotune_inputs with the project's
`@requires_cuda` or a pytest.skipif(not torch.cuda.is_available()) equivalent) and
import the guard if necessary; this ensures the test is skipped on CPU-only
runners while leaving set_autotune_inputs and add_scalar unchanged.

In `@tilelang/autotuner/tuner.py`:
- Around line 295-300: Validate out_idx values before converting negative
indices: for the single-int and iterable branches, check each original out_idx
is within the valid result-index range (use the same bounds logic as the
existing result-index legalization helper) instead of blindly adding
len(prim_func.params); if any index is out-of-range raise the same validation
error used elsewhere (not the scalar-input error). Ensure you reference out_idx,
prim_func.params and produce result_idx only after passing this bounds check so
values like -len(params)-1 are rejected rather than normalized to an invalid
index.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 15fc8b56-685d-43fc-b586-648a130b5e9f

📥 Commits

Reviewing files that changed from the base of the PR and between 55cf9c1 and ac2350e.

📒 Files selected for processing (2)

testing/python/autotune/test_tilelang_autotune_scalar_inputs.py
tilelang/autotuner/tuner.py

coderabbitai · 2026-04-22T13:33:49Z

+def test_autotune_scalar_inputs_with_set_autotune_inputs():
+    tune_a = torch.randn((4096,), device="cuda", dtype=torch.float32)
+    tune_s = 0.1
+    with set_autotune_inputs(tune_a, tune_s):
+        kernel = add_scalar()
+
+    a = torch.randn((4096,), device="cuda", dtype=torch.float32)
+    before = a.clone()
+    kernel(a, tune_s)
+
+    torch.testing.assert_close(a, before + tune_s, rtol=1e-4, atol=1e-4)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Inspect how existing tests guard CUDA-only allocations. rg -n -C2 'pytest\.mark\.skipif|torch\.cuda\.is_available|requires_cuda|device="cuda"' testing

Repository: tile-ai/tilelang

Length of output: 50373

🏁 Script executed:

#!/bin/bash # Examine the specific test file mentioned in the review cat -n testing/python/autotune/test_tilelang_autotune_scalar_inputs.py

Repository: tile-ai/tilelang

Length of output: 1704

🏁 Script executed:

#!/bin/bash # Check how other tests in the autotune directory are guarded ls -la testing/python/autotune/*.py | head -20 rg -n '@pytest.mark.skipif|@tilelang.testing.requires_cuda|import pytest|import torch' testing/python/autotune/ -A 2 | head -100

Repository: tile-ai/tilelang

Length of output: 3362

🏁 Script executed:

#!/bin/bash # Check if there's a conftest or similar setup in autotune directory that might guard all tests ls -la testing/python/autotune/ fd conftest testing/python/autotune/

Repository: tile-ai/tilelang

Length of output: 692

🏁 Script executed:

#!/bin/bash # Check what tilelang.testing.requires_cuda is rg -A5 'def requires_cuda|requires_cuda =' tilelang/testing --type python | head -30

Repository: tile-ai/tilelang

Length of output: 90

🏁 Script executed:

#!/antml:bash # Also check for imports and definition fd -e py testing | xargs grep -l 'requires_cuda' | head -5 find . -path ./node_modules -prune -o -name '*.py' -type f -exec grep -l 'requires_cuda.*=' {} \; | head -5

Repository: tile-ai/tilelang

Length of output: 90

🏁 Script executed:

#!/bin/bash # Look for tilelang.testing module definition find . -name "testing.py" -o -name "testing" -type d | grep tilelang ls -la tilelang/testing* 2>/dev/null || echo "No direct tilelang/testing files"

Repository: tile-ai/tilelang

Length of output: 322

🏁 Script executed:

#!/bin/bash # Check the tilelang/testing/__init__.py file cat tilelang/testing/__init__.py

Repository: tile-ai/tilelang

Length of output: 3930

Add CUDA guard to test requiring GPU.

The test allocates CUDA tensors on lines 30 and 35, so it will fail on CPU-only runners during setup. Use the repository's standard decorator:

Proposed fix

+@tilelang.testing.requires_cuda def test_autotune_scalar_inputs_with_set_autotune_inputs(): tune_a = torch.randn((4096,), device="cuda", dtype=torch.float32)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_autotune_scalar_inputs_with_set_autotune_inputs():

tune_a = torch.randn((4096,), device="cuda", dtype=torch.float32)

tune_s = 0.1

with set_autotune_inputs(tune_a, tune_s):

kernel = add_scalar()

a = torch.randn((4096,), device="cuda", dtype=torch.float32)

before = a.clone()

kernel(a, tune_s)

torch.testing.assert_close(a, before + tune_s, rtol=1e-4, atol=1e-4)

`@tilelang.testing.requires_cuda`

def test_autotune_scalar_inputs_with_set_autotune_inputs():

tune_a = torch.randn((4096,), device="cuda", dtype=torch.float32)

tune_s = 0.1

with set_autotune_inputs(tune_a, tune_s):

kernel = add_scalar()

a = torch.randn((4096,), device="cuda", dtype=torch.float32)

before = a.clone()

kernel(a, tune_s)

torch.testing.assert_close(a, before + tune_s, rtol=1e-4, atol=1e-4)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@testing/python/autotune/test_tilelang_autotune_scalar_inputs.py` around lines 29 - 39, The test test_autotune_scalar_inputs_with_set_autotune_inputs allocates CUDA tensors via set_autotune_inputs and local tensors (add_scalar), so add the repository's standard CUDA guard to the test (e.g., decorate test_autotune_scalar_inputs_with_set_autotune_inputs with the project's `@requires_cuda` or a pytest.skipif(not torch.cuda.is_available()) equivalent) and import the guard if necessary; this ensures the test is skipped on CPU-only runners while leaving set_autotune_inputs and add_scalar unchanged.

coderabbitai · 2026-04-22T13:33:50Z

+        if out_idx is None:
+            result_idx = []
+        elif isinstance(out_idx, int):
+            result_idx = [len(prim_func.params) + out_idx if out_idx < 0 else out_idx]
+        else:
+            result_idx = [len(prim_func.params) + idx if idx < 0 else idx for idx in out_idx]


⚠️ Potential issue | 🟡 Minor

Validate out_idx before normalizing negative indices.

Line 298 and Line 300 currently accept out-of-range values such as -len(params) - 1, normalizing them to invalid indices. Mirror the existing result-index legalization so this validator does not silently accept bad output indices or raise a misleading scalar-input error.

🐛 Proposed fix

- if out_idx is None: - result_idx = [] - elif isinstance(out_idx, int): - result_idx = [len(prim_func.params) + out_idx if out_idx < 0 else out_idx] - else: - result_idx = [len(prim_func.params) + idx if idx < 0 else idx for idx in out_idx] + param_count = len(prim_func.params) + if out_idx is None: + result_idx = set() + elif isinstance(out_idx, int): + if out_idx >= param_count or out_idx < -param_count: + raise ValueError(f"out_idx should be between {-param_count} and {param_count - 1}") + result_idx = {param_count + out_idx if out_idx < 0 else out_idx} + elif isinstance(out_idx, list): + result_idx = set() + for idx in out_idx: + if idx >= param_count or idx < -param_count: + raise ValueError(f"out_idx should be between {-param_count} and {param_count - 1}") + result_idx.add(param_count + idx if idx < 0 else idx) + else: + raise ValueError("out_idx should be a list of integers")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tilelang/autotuner/tuner.py` around lines 295 - 300, Validate out_idx values before converting negative indices: for the single-int and iterable branches, check each original out_idx is within the valid result-index range (use the same bounds logic as the existing result-index legalization helper) instead of blindly adding len(prim_func.params); if any index is out-of-range raise the same validation error used elsewhere (not the scalar-input error). Ensure you reference out_idx, prim_func.params and produce result_idx only after passing this bounds check so values like -len(params)-1 are rejected rather than normalized to an invalid index.

Triang-jyed-driung · 2026-04-22T14:55:54Z

Like I said in #2081, I don't want a simple rejection for these kernels. I want a way to manually specify whether this kernel should be retuned if some scalar parameter changes, while for other parameters, I would like to specify their default values during autotuning.

Rachmanino · 2026-04-22T15:09:21Z

Like I said in #2081, I don't want a simple rejection for these kernels. I want a way to manually specify whether this kernel should be retuned if some scalar parameter changes, while for other parameters, I would like to specify their default values during autotuning.

check my response in your issue, please.

LeiWang1999 · 2026-04-23T03:23:22Z

@Triang-jyed-driung Thanks for your suggestion. For this issue, we should introduce a parameter like a do_not_specify argument in autotune, but this PR also makes sense to me. However, we should fix the CI first. cc @Rachmanino

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@testing/python/autotune/test_tilelang_autotune_do_not_specialize.py`:
- Around line 117-136: The test reuses tensors a (512x256) and b (256x256) but
then calls matmul_do_not_spec(512,512,512,...) which requires A:(M,K)=512x512
and B:(N,K)=512x512; fix by creating new tensors for the second call with shapes
matching M,N,K (e.g. a2 = torch.randn(512,512, dtype=torch.float16,
device="cuda") and b2 = torch.randn(512,512, dtype=torch.float16,
device="cuda")) and use with set_autotune_inputs([a2, b2]) before the
positional-args invocation in test_do_not_specialize_kwargs_and_args so
matmul_do_not_spec receives correctly-shaped inputs and the tuner cache check is
valid.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ba065952-fbef-4280-b737-73686bd0933d

📥 Commits

Reviewing files that changed from the base of the PR and between ac2350e and cb617c6.

📒 Files selected for processing (2)

testing/python/autotune/test_tilelang_autotune_do_not_specialize.py
tilelang/autotuner/tuner.py

coderabbitai · 2026-04-28T14:34:15Z

+@tilelang.testing.requires_cuda
+def test_do_not_specialize_kwargs_and_args():
+    """do_not_specialize should work whether params are passed as args or kwargs."""
+    a = torch.randn(512, 256, dtype=torch.float16, device="cuda")
+    b = torch.randn(256, 256, dtype=torch.float16, device="cuda")
+
+    # First call: all kwargs
+    with set_autotune_inputs([a, b]):
+        matmul_do_not_spec(M=512, N=256, K=256, threads=64)
+    prev = len(matmul_do_not_spec._tuner_cache)
+
+    # Second call: positional args, N and K differ but are in do_not_specialize
+    with set_autotune_inputs([a, b]):
+        matmul_do_not_spec(512, 512, 512, threads=64)
+    new = len(matmul_do_not_spec._tuner_cache)
+
+    assert new == prev, (
+        f"do_not_specialize failed with positional args: "
+        f"cache grew from {prev} to {new} (expected no new entries)"
+    )


⚠️ Potential issue | 🟠 Major

Tensor shapes don't match the kernel parameters in the second call.

The test reuses tensors a (512×256) and b (256×256) for the second call with M=512, N=512, K=512, but the kernel expects:

A: shape (M, K) = (512, 512)

B: shape (N, K) = (512, 512)

This shape mismatch will either cause runtime errors or silent incorrect results.

🐛 Proposed fix: create matching tensors for the second call

`@tilelang.testing.requires_cuda` def test_do_not_specialize_kwargs_and_args(): """do_not_specialize should work whether params are passed as args or kwargs.""" - a = torch.randn(512, 256, dtype=torch.float16, device="cuda") - b = torch.randn(256, 256, dtype=torch.float16, device="cuda") + a1 = torch.randn(512, 256, dtype=torch.float16, device="cuda") + b1 = torch.randn(256, 256, dtype=torch.float16, device="cuda") # First call: all kwargs - with set_autotune_inputs([a, b]): + with set_autotune_inputs([a1, b1]): matmul_do_not_spec(M=512, N=256, K=256, threads=64) prev = len(matmul_do_not_spec._tuner_cache) # Second call: positional args, N and K differ but are in do_not_specialize - with set_autotune_inputs([a, b]): + a2 = torch.randn(512, 512, dtype=torch.float16, device="cuda") + b2 = torch.randn(512, 512, dtype=torch.float16, device="cuda") + with set_autotune_inputs([a2, b2]): matmul_do_not_spec(512, 512, 512, threads=64) new = len(matmul_do_not_spec._tuner_cache)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@tilelang.testing.requires_cuda

def test_do_not_specialize_kwargs_and_args():

"""do_not_specialize should work whether params are passed as args or kwargs."""

a = torch.randn(512, 256, dtype=torch.float16, device="cuda")

b = torch.randn(256, 256, dtype=torch.float16, device="cuda")

# First call: all kwargs

with set_autotune_inputs([a, b]):

matmul_do_not_spec(M=512, N=256, K=256, threads=64)

prev = len(matmul_do_not_spec._tuner_cache)

# Second call: positional args, N and K differ but are in do_not_specialize

with set_autotune_inputs([a, b]):

matmul_do_not_spec(512, 512, 512, threads=64)

new = len(matmul_do_not_spec._tuner_cache)

assert new == prev, (

f"do_not_specialize failed with positional args: "

f"cache grew from {prev} to {new} (expected no new entries)"

)

`@tilelang.testing.requires_cuda`

def test_do_not_specialize_kwargs_and_args():

"""do_not_specialize should work whether params are passed as args or kwargs."""

a1 = torch.randn(512, 256, dtype=torch.float16, device="cuda")

b1 = torch.randn(256, 256, dtype=torch.float16, device="cuda")

# First call: all kwargs

with set_autotune_inputs([a1, b1]):

matmul_do_not_spec(M=512, N=256, K=256, threads=64)

prev = len(matmul_do_not_spec._tuner_cache)

# Second call: positional args, N and K differ but are in do_not_specialize

a2 = torch.randn(512, 512, dtype=torch.float16, device="cuda")

b2 = torch.randn(512, 512, dtype=torch.float16, device="cuda")

with set_autotune_inputs([a2, b2]):

matmul_do_not_spec(512, 512, 512, threads=64)

new = len(matmul_do_not_spec._tuner_cache)

assert new == prev, (

f"do_not_specialize failed with positional args: "

f"cache grew from {prev} to {new} (expected no new entries)"

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@testing/python/autotune/test_tilelang_autotune_do_not_specialize.py` around lines 117 - 136, The test reuses tensors a (512x256) and b (256x256) but then calls matmul_do_not_spec(512,512,512,...) which requires A:(M,K)=512x512 and B:(N,K)=512x512; fix by creating new tensors for the second call with shapes matching M,N,K (e.g. a2 = torch.randn(512,512, dtype=torch.float16, device="cuda") and b2 = torch.randn(512,512, dtype=torch.float16, device="cuda")) and use with set_autotune_inputs([a2, b2]) before the positional-args invocation in test_do_not_specialize_kwargs_and_args so matmul_do_not_spec receives correctly-shaped inputs and the tuner cache check is valid.

@autotune

…tion - Move _validate_input_supply_requirements from __call__ into run(), after the "tunable parameters already provided" check. This prevents spurious errors when all tunable params are supplied by the caller and the autotuner skips profiling entirely (fixes test_example_mha_fwd_varlen). - Add `do_not_specialize` parameter to @autotune decorator. Parameters listed here are excluded from the autotune cache key, so changing their values reuses the previously tuned configuration instead of triggering re-autotuning. Usage: @autotune(configs=get_configs(), do_not_specialize=["batch_size", "UQ"]) @tilelang.jit def kernel(batch_size, UQ, ...): ... Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (1)

testing/python/autotune/test_tilelang_autotune_do_not_specialize.py (1)

113-123: ⚠️ Potential issue | 🟠 Major

Fix tensor shape mismatch in positional-args test.

Line 122 reuses a/b with shapes (512,256) and (256,256) for a call passing M=512, N=512, K=512. That input pair is incompatible with the declared kernel tensors (M,K) and (N,K), so this can fail at runtime or make the cache assertion unreliable.

🐛 Proposed fix

 `@tilelang.testing.requires_cuda`
 def test_do_not_specialize_kwargs_and_args():
     """do_not_specialize should work whether params are passed as args or kwargs."""
-    a = torch.randn(512, 256, dtype=torch.float16, device="cuda")
-    b = torch.randn(256, 256, dtype=torch.float16, device="cuda")
+    a1 = torch.randn(512, 256, dtype=torch.float16, device="cuda")
+    b1 = torch.randn(256, 256, dtype=torch.float16, device="cuda")
 
     # First call: all kwargs
-    with set_autotune_inputs([a, b]):
+    with set_autotune_inputs([a1, b1]):
         matmul_do_not_spec(M=512, N=256, K=256, threads=64)
     prev = len(matmul_do_not_spec._tuner_cache)
 
     # Second call: positional args, N and K differ but are in do_not_specialize
-    with set_autotune_inputs([a, b]):
+    a2 = torch.randn(512, 512, dtype=torch.float16, device="cuda")
+    b2 = torch.randn(512, 512, dtype=torch.float16, device="cuda")
+    with set_autotune_inputs([a2, b2]):
         matmul_do_not_spec(512, 512, 512, threads=64)
     new = len(matmul_do_not_spec._tuner_cache)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@testing/python/autotune/test_tilelang_autotune_do_not_specialize.py` around
lines 113 - 123, The second autotune invocation reuses tensors a (512x256) and b
(256x256) but calls matmul_do_not_spec(512,512,512,...), causing a shape
mismatch with the kernel's (M,K) and (N,K) expectations; fix by supplying inputs
whose shapes match M=512,N=512,K=512 — e.g., create new tensors (call them a2
and b2) with shapes (512,512) and (512,512) and use with
set_autotune_inputs([a2, b2]) for the positional-args call to matmul_do_not_spec
so the cache assertion remains valid.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@testing/python/autotune/test_tilelang_autotune_do_not_specialize.py`:
- Around line 113-123: The second autotune invocation reuses tensors a (512x256)
and b (256x256) but calls matmul_do_not_spec(512,512,512,...), causing a shape
mismatch with the kernel's (M,K) and (N,K) expectations; fix by supplying inputs
whose shapes match M=512,N=512,K=512 — e.g., create new tensors (call them a2
and b2) with shapes (512,512) and (512,512) and use with
set_autotune_inputs([a2, b2]) for the positional-args call to matmul_do_not_spec
so the cache assertion remains valid.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 93bc55f7-1591-4cc6-aa17-6734d5634cd0

📥 Commits

Reviewing files that changed from the base of the PR and between cb617c6 and 0219ee2.

📒 Files selected for processing (2)

testing/python/autotune/test_tilelang_autotune_do_not_specialize.py
tilelang/autotuner/tuner.py

🚧 Files skipped from review as they are similar to previous changes (1)

tilelang/autotuner/tuner.py

…ree-fix-autotune-scalar

Reject autotune default scalar inputs

ac2350e

Rachmanino mentioned this pull request Apr 22, 2026

[BUG] Autotuning + kernel value parameters = failure #2081

Open

2 tasks

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

LeiWang1999 previously approved these changes Apr 22, 2026

View reviewed changes

Rachmanino dismissed LeiWang1999’s stale review via cb617c6 April 28, 2026 14:21

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Rachmanino force-pushed the worktree-fix-autotune-scalar branch from cb617c6 to 0219ee2 Compare April 28, 2026 14:36

Rachmanino changed the title ~~[Enhance] Explicitly reject providing default scalar params for autotune~~ [Enhance] Reject default scalar params and support do_not_specialize for autotune Apr 28, 2026

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Rachmanino mentioned this pull request May 6, 2026

[Fix] Allow autotuning kernels with scalar value parameters #2136

Open

3 tasks

Merge branch 'main' of https://github.com/tile-ai/tilelang into workt…

da46def

…ree-fix-autotune-scalar

Conversation

Rachmanino commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Triang-jyed-driung commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rachmanino commented Apr 22, 2026

Uh oh!

LeiWang1999 commented Apr 23, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rachmanino commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Triang-jyed-driung commented Apr 22, 2026 •

edited

Loading