Skip to content

Regulate flash_attn_varlen_fp8_pertensor_func according to precision issue#123

Open
apinge wants to merge 2 commits into
zejunchen-zejun:dev/perffrom
apinge:flash_attn_fp8
Open

Regulate flash_attn_varlen_fp8_pertensor_func according to precision issue#123
apinge wants to merge 2 commits into
zejunchen-zejun:dev/perffrom
apinge:flash_attn_fp8

Conversation

@apinge

@apinge apinge commented Dec 29, 2025

Copy link
Copy Markdown

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@sammysun0711

Copy link
Copy Markdown
Collaborator

#125 refactor kv quantization to be fused in mrope rms kernel, keep this on hold if new approach can fix Qwen3-MoE accuracy issue.

@ZLkanyo009 ZLkanyo009 force-pushed the dev/perf branch 2 times, most recently from f8668d9 to c29e1c2 Compare March 13, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants