Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
cc3f5c2
[Refactor] refactor gemm_sp following 5d729eee
botbw Apr 16, 2026
0bce4d0
[Doc] update doc
botbw Apr 16, 2026
eb55efe
[Refactor] remove gemm_sp CUTLASS templates
botbw Apr 16, 2026
a8c2351
Merge branch 'main' of https://github.com/tile-ai/tilelang into refac…
LeiWang1999 Apr 17, 2026
de32b3b
[cute] add mma sp tempaltes for sm80
botbw Apr 17, 2026
5d9329d
[templates] use templates in codegen
botbw Apr 19, 2026
948eac6
Add WGMMA_SP templates
botbw Apr 27, 2026
cca3c67
Pass layout_map to py lowering
botbw Apr 27, 2026
d28ba8d
Fix type
botbw Apr 27, 2026
2260ece
Add wgmma.sp checking
botbw Apr 28, 2026
70acbbc
Implement wgmma_sp_ss
botbw May 8, 2026
117b855
Add wgmma_sp_rs
botbw May 8, 2026
717fdcb
Add sparse selector
botbw May 8, 2026
760fa33
Fix layout and param pass
botbw May 9, 2026
c6afc18
Fix transpose metadata
botbw May 9, 2026
7d54fbc
Add mma.sp fp8
botbw May 9, 2026
e3e1af5
Fix integer rounding
botbw May 9, 2026
441cc3e
Fix metadata layout
botbw May 10, 2026
edcfc0f
Update compress and test cases
botbw May 10, 2026
d4c8445
Remove debug print statements from sparse compress and wgmma emitter
botbw May 10, 2026
2c106cf
Clean up gemm_sp test file
botbw May 10, 2026
c874289
Remove unused tvm import from gemm_sp_wgmma
botbw May 10, 2026
4012553
Update examples, tests, and docs for new compress() API
botbw May 10, 2026
3312279
Refactor sparse constants into sparse_config.py and clean up examples
botbw May 10, 2026
040fd15
Compute e_factor and e_replicate_factor instead of hardcoding tables
botbw May 10, 2026
ff6a302
Use DataType keys in SPARSE_PARAMS
botbw May 10, 2026
e91ee42
Move sparse_config.py from utils/ to intrinsics/
botbw May 11, 2026
6870ae3
Rename sparse_config.py to sparse_params.py
botbw May 11, 2026
8326178
Fix sparse.py breaking TIR type hints by removing future annotations
botbw May 11, 2026
fd8ca5b
Fix compress utils
botbw May 11, 2026
1c90464
Update example
botbw May 11, 2026
771254e
Add a compress benchmark
botbw May 11, 2026
cd52cc5
Remove print
botbw May 11, 2026
bd11d73
Merge branch 'main' into refactor_gemm_sp
botbw May 12, 2026
9796a76
Remove unused MetaType
botbw May 12, 2026
e2f2792
Polish
botbw May 12, 2026
d55434c
Refactor gemm_sp op
botbw May 12, 2026
69eb8ce
Update doc && remove unused file
botbw May 12, 2026
c10a262
Fix bug spotted by coderabbit
botbw May 12, 2026
0f3f7e0
Fix
botbw May 12, 2026
7eecf89
Add note
botbw May 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion benchmark/matmul/benchmark_matmul_sp.py
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ def main(
T.copy(B[k * block_K, bx * block_N], B_shared)
# Perform a partial matrix multiplication:
# C_local += A_shared @ B_shared
T.gemm_sp_v2(
T.gemm_sp(
A_shared,
E_shared,
B_shared,
Expand Down
31 changes: 4 additions & 27 deletions docs/deeplearning_operators/matmul_sparse.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,16 +49,9 @@ Here, `A_sparse` contains all the non-zero elements of `A`, while `E` stores the

> NOTE: When using CUTLASS compressor, there is no naive position correspondence between the positions in `A_sparse`/`A` and `E`. (i.e. the 4-element group at [n, k] doesn't match the 4-bit metadata at [n, k] if you consider metadata as int4 tensor)
The metadata is reordered internally to optimize memory access patterns (e.g., for ldsm instructions and vectorized loads).
For more information, see **A note on `gemm_sp` and `gemm_sp_v2`**.

## `T.gemm_sp` with CUTLASS's compressor

:::{warning}

It is strongly recommended to use T.gemm_sp_v2 due to its greater flexibility and faster compilation time.

:::

A 2:4 sparse GEMM kernel is similar to its dense counterpart, except that it also requires handling the associated metadata.

Check comments in below kernel code for required modification.
Expand Down Expand Up @@ -125,13 +118,11 @@ def matmul_sp_sm80(
return main
```

Under the hood, `gemm_sp` invokes templates adapted from `CUTLASS`, and a compatible metadata layout must be specified using `T.annotate_layout`.

## `T.gemm_sp_v2` with a custom compressor
When using `CUTLASS` compressor, the layout has to be made clear to `TileLang` via `T.annotate_layout`.

To migrate to `gemm_sp_v2`, simply replace occurrences of `gemm_sp`.
## `T.gemm_sp` with a custom compressor

Unlike `gemm_sp`, `gemm_sp_v2` can operate without `T.annotate_layout`, and it also supports user-defined layouts and compressors.
`T.gemm_sp` lowers directly to PTX, removing the need for a fixed metadata layout. It can operate without `T.annotate_layout`, and supports user-defined layouts and compressors.

The metadata is stored in a `(u)int8`/`(u)int16`/`(u)int32` tensor, where **each 4-bit chunk represents two 2-bit indices** of non-zero elements within four consecutive elements. Here, we start with an `int16` example, which is the **default dtype** for `bf16` and `fp16` on Ampere GPUs.

Expand Down Expand Up @@ -172,7 +163,7 @@ def decode_metadata(meta: torch.Tensor) -> torch.Tensor:

The compressor can be implement at either `PyTorch`/`NumPy` level or kernel level.

For example, `PyTorch` provides an Ampere compressor [here](https://github.com/pytorch/pytorch/blob/267d0197bfca0232488d51dd1ff735d619adc2cf/torch/sparse/_semi_structured_conversions.py#L47-L179). Note that in this implementation, a [permutation](https://github.com/pytorch/pytorch/blob/267d0197bfca0232488d51dd1ff735d619adc2cf/torch/sparse/_semi_structured_conversions.py#L173-L175) is applied to match CUTLASS’s metadata layout. If you do not annotate a metadata layout when using `gemm_sp_v2`, your compressor should replicate the same behavior as the PyTorch example—but without using the `_calculate_meta_reordering_scatter_offsets` function.
For example, `PyTorch` provides an Ampere compressor [here](https://github.com/pytorch/pytorch/blob/267d0197bfca0232488d51dd1ff735d619adc2cf/torch/sparse/_semi_structured_conversions.py#L47-L179). Note that in this implementation, a [permutation](https://github.com/pytorch/pytorch/blob/267d0197bfca0232488d51dd1ff735d619adc2cf/torch/sparse/_semi_structured_conversions.py#L173-L175) is applied to match CUTLASS’s metadata layout. If you do not annotate a metadata layout when using `gemm_sp`, your compressor should replicate the same behavior as the PyTorch example—but without using the `_calculate_meta_reordering_scatter_offsets` function.
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated

If you want to use a custom metadata layout in your kernel, one approach is to define the layout in `TileLang` and then apply the same layout to both your compressor kernel and the matmul_sp kernel.

Expand Down Expand Up @@ -245,17 +236,3 @@ def compress_kernel(M, K, block_M, block_K, dtype, use_cutlass_layout):

return kernel
```

## A note on `gemm_sp` and `gemm_sp_v2`

Initially, `T.gemm_sp` followed the same design as `T.gemm`, lowering to a `CUTLASS` template. This inherently requires metadata to be reordered offline following a predetermined layout.

However, fixing a specific layout introduces several potential issues:

1. Painful debugging experience: Debugging a failed kernel becomes difficult due to the reordered indexing, including permutations and swizzling.

2. Limited flexibility: For example, concatenating two compressed tensors, such as `A_sparse_0` and `A_sparse_1`, into a new `A_sparse` makes sense. However, concatenating their metadata `E_0` and `E_1` may not be valid unless the layout allows it mathematically.

3. Alignment requirements: `CUTLASS` enforces strict alignment checks, and many hyperparameter configurations can lead to compilation errors. (For reference, sm8x was implemented in `CUTLASS 2`.)

`T.gemm_sp_v2` was designed to address these limitations, following the approach of `T.gemm`. It lowers directly to PTX, removing the need for a fixed metadata layout.
2 changes: 1 addition & 1 deletion examples/gemm_sp/example_custom_compress.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ def gemm_sp_fp16_custom_compress(
T.copy(A_sparse[by * block_M, k * block_K // 2], A_shared)
T.copy(E[by * block_M, k * block_K // e_factor], E_shared)
T.copy(B[k * block_K, bx * block_N], B_shared)
T.gemm_sp_v2(A_shared, E_shared, B_shared, C_local, False, False, policy=policy)
T.gemm_sp(A_shared, E_shared, B_shared, C_local, False, False, policy=policy)

T.copy(C_local, C_shared)
T.copy(C_shared, C[by * block_M, bx * block_N])
Expand Down
Loading
Loading