Skip to content

Apply pa_gluon for aiter_backend#278

Open
apinge wants to merge 2 commits into
zejunchen-zejun:Qwen3.5_v0.5.9from
apinge:pa_gluon_mha_vllm
Open

Apply pa_gluon for aiter_backend#278
apinge wants to merge 2 commits into
zejunchen-zejun:Qwen3.5_v0.5.9from
apinge:pa_gluon_mha_vllm

Conversation

@apinge

@apinge apinge commented May 15, 2026

Copy link
Copy Markdown

Motivation

In this PR, I applied torch.ops.aiter.pa_decode_gluon for aiter_backend. In order to match the new layout, the MHA prefill must be changed to flash_attn_varlen_func and some extra triton kernels added.

For some old triton version, there will be compliation issues, pls apply this patch apinge/aiter@3687fb5 to the aiter.

Modifications

Accuracy Tests

# FP8 TP2
> python3 sglang/benchmark/gsm8k/bench_sglang.py --port 9080 --enable-thinking --tokenizer-path /models/Qwen3.5-27B-PTPC-compressor --max-new-tokens 4096                                                                                                                        
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [08:46<00:00,  2.63s/it]
Accuracy: 0.750
Invalid: 0.010
Latency: 526.365 s
Output throughput: 527.392 token/s
# BF16 TP2
>python3 sglang/benchmark/gsm8k/bench_sglang.py --port 9080 --enable-thinking --tokenizer-path /models/Qwen3.5-27B --max-new-tokens 4096
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [03:37<00:00,  1.09s/it]
Accuracy: 0.820
Invalid: 0.005
Latency: 217.747 s
Output throughput: 1326.462 token/s

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@apinge apinge marked this pull request as ready for review May 15, 2026 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant