Skip to content

feat: FA4#56

Open
xrsrke wants to merge 2 commits intoupstream-2026-10-02from
phuc/fa4
Open

feat: FA4#56
xrsrke wants to merge 2 commits intoupstream-2026-10-02from
phuc/fa4

Conversation

@xrsrke
Copy link

@xrsrke xrsrke commented Mar 5, 2026

No description provided.

xrsrke added 2 commits March 5, 2026 18:25
Add FlashAttention4Wrapper in attention.py that handles the tensor layout
conversion between torchtitan's (batch, nheads, seqlen, headdim) and FA4's
(batch, seqlen, nheads, headdim) format. Uses lazy import to avoid breaking
users without fa4 installed.

Wire up fa4 attention type in llama3, llama4, qwen3, and deepseek_v3 models.
Update context parallel validation to properly block unsupported attention
types. Add Qwen3 30B-A3B-fa4 and 30B-A3B-flex-causal benchmark flavors.
Block fa4 attention type with Context Parallel in qwen3 and deepseek_v3
parallelize.py, and fix llama3 args.py to use minimal denylist pattern.
Add docs/fa4.md with usage, benchmark results, and limitations.
@xrsrke xrsrke requested a review from jquesnelle March 6, 2026 20:36
@xrsrke xrsrke marked this pull request as ready for review March 6, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant