feat: support post-norm EAGLE + add speculative decoding docs by Dogacel · Pull Request #174 · lightseekorg/tokenspeed

Dogacel · 2026-05-17T19:39:17Z

Summary

Add documentation for speculative decoding recipes for running and benchmarking (currently tested Llama 3.1 8B and GPT-oss 20B).
Add mt-bench to the bench script and report mean acceptance length.
Support post-norm architecture speculative decoding models as published with Attention Drift paper.
Warn user if any model weight is not loaded properly for EAGLE-3 speculative decoding models.

Test Plan

Test standard speculative decoding with one of the existing checkpoints (lmsys's llama 3.1 8b is used).
Test speculative decoding with one of the post-norm checkpoints (SpecDrift collection used).

Commands (2 terminals):

Running the model (Replace draft model with lmsys/sglang-EAGLE3-LLaMA3.1-Instruct-8B to test standard architecture models)

tokenspeed serve nreHieW/Llama-3.1-8B-Instruct --speculative-algorithm EAGLE3 --speculative-draft-model-path Dogacel/specdrift-llama3-8b-eagle3-post-norm --speculative-num-steps 7 --host 0.0.0.0 --dtype bfloat16 --kvstore-size 16 --port 8999

tokenspeed bench serve   --backend openai-chat   --endpoint /v1/chat/completions   --host 127.0.0.1 --port 8999   --dataset-name mtbench   --input-len 2048   --output-len 2048   --num-prompts 80   --max-concurrency 32   --save-result --save-detailed --result-dir results/   --extra-body '{"temperature": 0}'

Results:

============ Serving Benchmark Result ============
Successful requests:                     80
Failed requests:                         0
Maximum request concurrency:             32
Benchmark duration (s):                  17.74
Total input tokens:                      6053
Total generated tokens:                  34407
Request throughput (req/s):              4.51
Output token throughput (tok/s):         1939.40
Peak output token throughput (tok/s):    962.00
Peak concurrent requests:                38.00
Total token throughput (tok/s):          2280.58
Mean accept length (tok/step):           3.61
---------------Time to First Token----------------
Mean TTFT (ms):                          399.01
Median TTFT (ms):                        258.22
P99 TTFT (ms):                           788.44
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          10.76
Median TPOT (ms):                        9.70
P99 TPOT (ms):                           18.83
---------------Inter-token Latency----------------
Mean ITL (ms):                           38.50
Median ITL (ms):                         25.36
P99 ITL (ms):                            153.76
==================================================

Signed-off-by: Doğaç Eldenk <[email protected]>

Dogacel requested a review from a team as a code owner May 17, 2026 19:39

feat: support post-norm EAGLE + add speculative decoding docs

0327176

Signed-off-by: Doğaç Eldenk <[email protected]>

Dogacel force-pushed the post-norm-eagle branch from 2177e75 to 0327176 Compare May 17, 2026 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support post-norm EAGLE + add speculative decoding docs#174

feat: support post-norm EAGLE + add speculative decoding docs#174
Dogacel wants to merge 1 commit into
lightseekorg:mainfrom
Dogacel:post-norm-eagle

Dogacel commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Dogacel commented May 17, 2026

Summary

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant