Skip to content

[GDN] Fused all preprocessing into one kernel#244

Open
apinge wants to merge 5 commits into
zejunchen-zejun:Qwen3.5_v0.5.9from
apinge:fused_gdn_zejun
Open

[GDN] Fused all preprocessing into one kernel#244
apinge wants to merge 5 commits into
zejunchen-zejun:Qwen3.5_v0.5.9from
apinge:fused_gdn_zejun

Conversation

@apinge

@apinge apinge commented Apr 9, 2026

Copy link
Copy Markdown

Motivation

Same change as vllm-project/vllm#38787

Modifications

Accuracy Tests

Qwen3.5-27B-PTPC-compressor

Tested on MI308X

lm_eval --model local-completions \
    --model_args '{"base_url": "http://localhost:9080/v1/completions", "model": "/models/Qwen3.5-27B-PTPC-compressor", "num_concurrent": 256, "max_retries": 10, "max_gen_toks": 2048}' \
    --tasks gsm8k \
    --batch_size auto \
    --num_fewshot 5 \
    --trust_remote_code 2>&1 | tee gsm8k_eval.log

local-completions ({'base_url': 'http://localhost:9080/v1/completions', 'model': '/models/Qwen3.5-27B-PTPC-compressor', 'num_concurrent': 256, 'max_retries': 10, 'max_gen_toks': 2048}), gen_kwargs: ({}), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7597|±  |0.0118|
|     |       |strict-match    |     5|exact_match|↑  |0.8537|±  |0.0097|

qwen3p5_397B_ptpc

RESULT_FILE: accuracy_test_results/offical_qwen3p5_397B_ptpc_gsm8k_results.json
GSM8K flexible extract score: 0.95

Benchmarking and Profiling

Tested on MI308X

./run_pure_text_bench.sh   
bench model: /models/Qwen3.5-27B-PTPC-compressor/
input tokens: 8000
output tokens: 500
max concurrency: 1
num prompts: 32
dataset-name: random


#Input tokens: 256000
#Output tokens: 16000

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    inf       
Max request concurrency:                 1         
Successful requests:                     32        
Benchmark duration (s):                  265.83    
Total input tokens:                      256000    
Total input text tokens:                 256000    
Total generated tokens:                  16000     
Total generated tokens (retokenized):    15994     
Request throughput (req/s):              0.12      
Input token throughput (tok/s):          963.03    
Output token throughput (tok/s):         60.19     
Peak output token throughput (tok/s):    71.00     
Peak concurrent requests:                2         
Total token throughput (tok/s):          1023.22   
Concurrency:                             1.00      
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   8305.94   
Median E2E Latency (ms):                 8274.94   
P90 E2E Latency (ms):                    8374.91   
P99 E2E Latency (ms):                    8659.60   
---------------Time to First Token----------------
Mean TTFT (ms):                          1119.79   
Median TTFT (ms):                        1113.78   
P99 TTFT (ms):                           1192.66   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          14.40     
Median TPOT (ms):                        14.36     
P99 TPOT (ms):                           15.06     
---------------Inter-Token Latency----------------
Mean ITL (ms):                           14.40     
Median ITL (ms):                         14.32     
P95 ITL (ms):                            14.49     
P99 ITL (ms):                            15.14     
Max ITL (ms):                            484.70    
==================================================

Checklist

@apinge apinge marked this pull request as ready for review April 9, 2026 09:42
apinge added 5 commits April 13, 2026 11:04
Signed-off-by: apinge <Tong.Qiu2@amd.com>
Signed-off-by: apinge <Tong.Qiu2@amd.com>
Signed-off-by: apinge <Tong.Qiu2@amd.com>
Signed-off-by: apinge <Tong.Qiu2@amd.com>
Signed-off-by: apinge <Tong.Qiu2@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant