Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions tileops/manifest/reduction.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -636,7 +636,7 @@ ArgminFwdOp:
AllFwdOp:
ref_api: "torch.all"
family: reduction
status: spec-only # dim=None not yet implemented; update when impl lands
status: implemented
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While flipping the status to implemented is correct given the implementation provided, there is a discrepancy between the manifest and the Python implementation in tileops/ops/reduction/all_op.py. The manifest defines the default for dim as null (None), which matches torch.all behavior (full reduction). However, the implementation in all_op.py (line 65) defaults to -1. This inconsistency should be resolved to ensure the op behaves as specified when used via the manifest-driven API or when instantiated without arguments.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified — AllFwdOp.__init__ (tileops/ops/reduction/all_op.py:65) defaults dim=-1, manifest says null. The code-only prereq PR will flip the default to None (same change planned for _ReduceOpBase, _WelfordReduceOp, AnyFwdOp, CountNonzeroFwdOp).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Marking AllFwdOp as implemented is premature due to several issues that should be addressed to ensure consistency and correctness:

  1. Kernel Bug: In tileops/kernels/reduction/logical_reduce.py, the non-tiled kernel _logical_reduce_kernel (lines 114-123) only performs row masking if _needs_pad (column padding) is true. If the hidden dimension N is aligned (e.g., 256) but the number of rows M is not a multiple of block_m, the T.copy at line 123 will perform an out-of-bounds read.
  2. Default Value Discrepancy: The manifest defines the default dim as null (full reduction) at line 647, whereas the Python implementation in all_op.py defaults to -1 (last-dim reduction) at line 65. This inconsistency can lead to unexpected behavior in manifest-driven tools or benchmarks.
  3. Supported Dtypes: The manifest signature (line 643) only lists float16 | bfloat16 | float32 | bool, but the implementation in all_op.py (lines 47-48) and the kernel logic explicitly support int32, int64, and complex types via pre-conversion to float32.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Item-by-item verdict:

  1. Kernel out-of-bounds on aligned-N path — verified false positive. Ran compute-sanitizer --tool memcheck on AllFwdOp with M=129, N=256, block_m=8 (the smallest tile config; would expose any tail-row OOB) plus 15 other shapes (M ∈ {129, 257, 513, 1025} × N ∈ {128, 256, 512, 1024}). Zero errors, output canary intact, numerical results match torch.all. TileLang's T.copy(x[a, b], shared) lowering inserts implicit row-bounds guards from the source tensor's static extent T.Tensor[(M, N), dtype], so the manual if pid_m*block_m + i < M is redundant. The "else" branch is safe. No kernel fix needed.

  2. Default dim=-1 ↔ manifest null (real) — will be fixed in the code-only prereq PR.

  3. Missing int32 / int64 / complex in signature dtype list (real, manifest-only) — the impl does promote those via _pre_kernel. The signature dtype list will be expanded in the follow-up manifest-only PR (outside the status-flip carve-out).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogicalReduceKernel still has an unmasked tail-row read/write path in _logical_reduce_kernel: when N is aligned, the else branch does T.copy(x[pid_m * block_m, 0], shared_buf) without checking pid_m * block_m + i < M, so the last block can read past the end of x and write past the end of out for shapes like M=129, N=512 -> keep these entries spec-only or mask every load/store before promoting them.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified false positive after empirical check. Ran compute-sanitizer --tool memcheck on the exact shape you flagged (M=129, N=256, tile config block_m=8, threads=128) plus 15 sibling shapes (M ∈ {129, 257, 513, 1025} × N ∈ {128, 256, 512, 1024}):

  • 0 memcheck errors across all configs.
  • 8 KiB input canary (zero-initialized region adjacent to x) intact after every run.
  • Numerical results match torch.all / torch.any / torch.count_nonzero on each shape.

Reason: TileLang lowers T.copy(x[pid_m * block_m, 0], shared_buf) against the source tensor's statically-declared extent T.Tensor[(M, N), dtype]. The lowered loads carry implicit row-bounds predicates derived from M, so the explicit if pid_m*block_m + i < M is unnecessary — the codegen already inserts equivalent guards. The DSL reads as "raw CUDA" but the safety wrapper sits one level lower.

If you have a counter-example shape that triggers the OOB I'd happily re-run sanitizer on it, but on the configs the kernel actually compiles to for these ops, there's no real-world OOB to fix.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllFwdOp still defaults dim to -1 in __init__ and inherits _empty_dim_policy = "reject", but the manifest entry advertises dim: null and treats dim=[] as a no-op, so manifest-driven callers that omit dim or pass an empty sequence do not get the promoted semantics -> change the op default and empty-dim handling to match the manifest before marking this implemented.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed on the default-dim side — AllFwdOp.__init__ (tileops/ops/reduction/all_op.py:65) defaults dim=-1, manifest says null. Code-only prereq PR will fix it on All/Any/CountNonzero plus the two base classes.

On _empty_dim_policy = "reject" vs the manifest's "treat dim=[] as no-op" shape_rules — PyTorch's torch.all(x, dim=[]) actually errors, so "reject" is the right semantics. The manifest shape_rules text is the side that's wrong (matches CountNonzero's "full" policy rather than All/Any's "reject"). That's a manifest-only fix and will be bundled into the manifest follow-up (outside the status-flip carve-out).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These entries still do not conform to the PyTorch API: the shared reduce path rejects scalar tensors (x.ndim == 0) with ValueError("Input tensor must be at least 1D"), while torch.all, torch.any, and torch.count_nonzero accept 0-D tensors. status: implemented is a hard conformance claim, so support scalar inputs and add coverage, or keep these entries spec-only / explicitly narrow the manifest contract.


signature:
inputs:
Expand Down Expand Up @@ -682,7 +682,7 @@ AllFwdOp:
AnyFwdOp:
ref_api: "torch.any"
family: reduction
status: spec-only # dim=None not yet implemented; update when impl lands
status: implemented
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to AllFwdOp, there is a discrepancy between the manifest default for dim (null) and the implementation in tileops/ops/reduction/any_op.py (line 65), which defaults to -1. The implementation should be updated to default to None to align with the manifest and torch.any semantics.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed — same root cause as AllFwdOp. AnyFwdOp.__init__ (tileops/ops/reduction/any_op.py:65) defaults dim=-1. Code-only prereq PR fixes it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Marking AnyFwdOp as implemented is premature for the same reasons as AllFwdOp:

  1. Kernel Bug: The non-tiled kernel in logical_reduce.py lacks row masking when N is aligned but M is not a multiple of block_m, leading to potential out-of-bounds reads in T.copy.
  2. Default Value Discrepancy: The manifest default for dim is null (line 693), while the Python default is -1 (line 65 of any_op.py).
  3. Supported Dtypes: The manifest signature (line 689) should be updated to include int32, int64, and complex types supported by the implementation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same verdicts as the AllFwdOp block:

  1. Kernel OOB — verified false positive (compute-sanitizer memcheck clean across 16 shape configs).
  2. Default dim — real; code-only prereq PR.
  3. Dtype list — real, manifest-only follow-up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnyFwdOp still defaults dim to -1 in __init__ and inherits _empty_dim_policy = "reject", but the manifest entry advertises dim: null and treats dim=[] as a no-op, so manifest-driven callers that omit dim or pass an empty sequence do not get the promoted semantics -> change the op default and empty-dim handling to match the manifest before marking this implemented.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed same as the AllFwdOp thread — AnyFwdOp.__init__ (tileops/ops/reduction/any_op.py:65) defaults dim=-1. Code-only prereq PR will flip it. The _empty_dim_policy = "reject" ↔ manifest shape_rules disagreement also matches AllFwdOp's analysis — PyTorch errors on dim=[] so "reject" is correct; manifest shape_rules need a follow-up fix.


signature:
inputs:
Expand Down Expand Up @@ -728,7 +728,7 @@ AnyFwdOp:
CountNonzeroFwdOp:
ref_api: "torch.count_nonzero"
family: reduction
status: spec-only # dim=None not yet implemented; update when impl lands
status: implemented
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are a few inconsistencies in the manifest for CountNonzeroFwdOp that should be addressed now that it is marked as implemented:

  1. Default Value Discrepancy: The manifest defaults dim to null (line 739), but the implementation in tileops/ops/reduction/count_nonzero.py (line 72) defaults to -1.
  2. Shape Rule Inconsistency: The shape_rules (line 742) use and len(dim) > 0 for the list/tuple check, which causes dim=[] to be treated as a full reduction (returning a scalar). In contrast, AllFwdOp and AnyFwdOp treat dim=[] as a no-op (returning the input shape). Given that torch.count_nonzero does not support dim=[], it should either be forbidden or follow the no-op convention of the other logical reductions.
  3. Missing Dtype: The inputs signature (line 735) is missing bool, which is supported by both PyTorch and the provided implementation in count_nonzero.py.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Item-by-item verdict:

  1. Default dim mismatch (real)CountNonzeroFwdOp.__init__ (tileops/ops/reduction/count_nonzero.py:72) defaults dim=-1. Will be fixed in the code-only prereq PR.
  2. shape_rules empty-list inconsistency (real, manifest-only) — CountNonzero uses _empty_dim_policy="full", while All/Any inherit "reject". The shape_rules text for CountNonzero / All / Any does need to be unified with the runtime policy and aligned to PyTorch (torch.count_nonzero rejects dim=[]). Outside the status-flip carve-out — separate manifest-only PR.
  3. Missing bool dtype (real, manifest-only) — signature dtype list will be expanded to include bool (and int32/int64/complex for All/Any/CountNonzero where the impl supports them via _pre_kernel fp32 promotion) in the same manifest-only follow-up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Marking CountNonzeroFwdOp as implemented is premature due to the following issues:

  1. Kernel Bug: Similar to the other logical reduction ops, the non-tiled kernel in logical_reduce.py is missing row masking for aligned N, causing unsafe T.copy operations when M % block_m != 0.
  2. Default Value Discrepancy: The manifest default for dim is null (line 739), while the Python default is -1 (line 72 of count_nonzero.py).
  3. Supported Dtypes: The manifest signature (line 735) is missing bool, int32, int64, and complex types, all of which are supported by the implementation and are common use cases for count_nonzero.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same verdicts as the AllFwdOp / AnyFwdOp blocks for CountNonzeroFwdOp:

  1. Kernel OOB — verified false positive (compute-sanitizer memcheck clean).
  2. Default dim — real; code-only prereq PR.
  3. Dtype list (bool, int32, int64, complex) — real, manifest-only follow-up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CountNonzeroFwdOp still defaults dim to -1 in __init__, while the manifest entry advertises dim: null; workload-driven and benchmark code that omit dim will reduce the last axis instead of a full-tensor reduction -> switch the op default to None or leave the entry spec-only.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed — CountNonzeroFwdOp.__init__ (tileops/ops/reduction/count_nonzero.py:72) defaults dim=-1, manifest says null. Code-only prereq PR will fix it (along with All/Any and the two base classes).


signature:
inputs:
Expand Down
Loading