Skip to content

Commit 68081f5

Browse files
Aya-ZIbrafacebook-github-bot
authored andcommitted
Add cutlass decode kernel to TritonBench (#4853)
Summary: Pull Request resolved: #4853 X-link: facebookresearch/FBGEMM#1875 Add cutlass blackwell FMHA decode kernel implementation to TritonBench benchmarking suite . Reviewed By: sryap Differential Revision: D80041532
1 parent 5df9f73 commit 68081f5

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

fbgemm_gpu/experimental/gen_ai/src/attention/cuda/cutlass_blackwell_fmha/blackwell_gen_impl.cu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -304,7 +304,7 @@ at::Tensor dispatch_fmha_gen_fwd(
304304

305305
return DISPATCH_ELEMENT_TYPE(q.scalar_type(), Element, [&] {
306306
return DISPATCH_KERNEL_TYPE(static_cast<int>(kernel_type), KType, [&] {
307-
GenRunner<Element, KType, Shape<_128, _128, _128>, Shape<_1, _1, _1>>
307+
GenRunner<Element, KType, Shape<_128, _256, _128>, Shape<_1, _1, _1>>
308308
runner;
309309
return runner.fmha_fwd(q, k, v, seqlen_kv, batch_idx);
310310
});

0 commit comments

Comments
 (0)