Add SDPA fallback for sliding-window attention by fayerman-source · Pull Request #161 · karpathy/autoresearch

fayerman-source · 2026-03-11T03:00:06Z

Summary

use Flash Attention 3 only on Hopper and fall back to PyTorch SDPA elsewhere
precompute and reuse SDPA sliding-window masks instead of rebuilding them every forward pass
keep the full-context path on is_causal=True when no explicit sliding-window mask is needed

Why

The current non-Hopper path still routes through FA3, which fails on RTX 5070-class hardware. This keeps the Hopper path unchanged, but makes the same model runnable on non-Hopper GPUs and avoids repeated T x T mask construction on the SDPA path.

This was motivated by the non-Hopper discussion in #36 and the SDPA-focused follow-up in #108.

Benchmark note

RTX 5070, CUDA SDPA microbenchmark, batch 8, seq 2048, 4-layer test model:

variant	tok/s	ms/fwd
cached SDPA mask	1,248,774	13.12
rebuild mask each forward	1,222,867	13.40

I also smoke-tested a tiny GPU forward pass on the SDPA path after this change.

karpathy · 2026-03-11T05:36:23Z

i don't think i'll merge this (too bloating), but i will leave the PR up.

svlandeg · 2026-03-11T17:02:46Z

train.py

 from kernels import get_kernel
 cap = torch.cuda.get_device_capability()
-# varunneal's FA3 is Hopper only, use kernels-community on non-Hopper GPUs
-repo = "varunneal/flash-attention-3" if cap == (9, 0) else "kernels-community/flash-attn3"


For Ada/Ampere, it's still nice to use "kernels-community/flash-attn3". I had some improvements using it on an A100.

Add SDPA fallback for sliding-window attention

310db60

svlandeg reviewed Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SDPA fallback for sliding-window attention#161

Add SDPA fallback for sliding-window attention#161
fayerman-source wants to merge 1 commit intokarpathy:masterfrom
fayerman-source:upstream/sdpa-mask-cache

fayerman-source commented Mar 11, 2026

Uh oh!

karpathy commented Mar 11, 2026

Uh oh!

svlandeg Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fayerman-source commented Mar 11, 2026

Summary

Why

Benchmark note

Uh oh!

karpathy commented Mar 11, 2026

Uh oh!

svlandeg Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants