Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
6b696e5
feat(grpc): add TokenSpeed gRPC client and router wiring
key4ng May 8, 2026
25375e5
refactor(grpc): extract OpenAI→sampling-params helpers to a common mo…
key4ng May 9, 2026
76bf572
refactor(grpc): give TokenSpeed its own IR arms (drop SGLang imperson…
key4ng May 9, 2026
e400fab
revert(tokenizer): defer OpenAI tool-wrapper strip + strict:false inj…
key4ng May 9, 2026
8478a63
style(grpc): trim verbose comments PR1 introduced
key4ng May 9, 2026
656f1c2
style(grpc): clean up code comments
key4ng May 11, 2026
c478f1a
fix(grpc): apply model sampling defaults to TokenSpeed requests
key4ng May 11, 2026
e93dca9
style(grpc): tidy lib.rs comments
key4ng May 11, 2026
6f84101
feat(grpc_servicer): add TokenSpeed servicer
key4ng May 8, 2026
6bb18d2
refactor(grpc_servicer): tighten _finish_reason_to_dict
key4ng May 9, 2026
93038d1
docs(grpc_servicer): use --model in tokenspeed entrypoint usage example
key4ng May 12, 2026
a812f5c
fix(grpc_servicer): handle ServerArgs ``_path`` → bare-name renames
key4ng May 12, 2026
8a3e651
feat(grpc_servicer): wrap json_schema as structural_tag for reasoning…
key4ng May 12, 2026
c869ee9
ci(tokenspeed): add CI install + GPU e2e coverage
key4ng May 8, 2026
7b152c0
ci(tokenspeed): drop private-repo auth now that tokenspeed is open-so…
key4ng May 12, 2026
34747d2
ci(tokenspeed): run install inside the official build-env container
key4ng May 12, 2026
7279401
ci(tokenspeed): revert job-level container; k8s runner can't use docker
key4ng May 12, 2026
7e948c6
fix(e2e): pass --model to tokenspeed worker (upstream renamed --model…
key4ng May 12, 2026
24d2f14
test(e2e): swap TestEnableThinking model to Qwen3.5-27B
key4ng May 12, 2026
4f58c5c
fix(e2e): cap Qwen3.5-27B context at 16K for the 1×H100 KV-cache budget
key4ng May 12, 2026
779445a
test(e2e): retarget TestEnableThinking at Qwen/Qwen3-4B
key4ng May 12, 2026
5ab3671
add none reasoning parser
LorrinWWW May 12, 2026
963baa4
ci(release): add dev wheel workflow → GitHub Releases (#1471)
key4ng May 12, 2026
921a3d9
ci(release): publish dev wheels to whl index (#1473)
zhyncs May 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .github/actions/setup-tokenspeed/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: 'Setup TokenSpeed Backend'
description: 'Create Python venv and install TokenSpeed (engine + kernel + scheduler) from source.'

runs:
using: 'composite'
steps:
- name: Setup Python venv
shell: bash
run: bash scripts/ci_setup_python_venv.sh

- name: Install TokenSpeed
shell: bash
run: bash scripts/ci_install_tokenspeed.sh
6 changes: 5 additions & 1 deletion .github/workflows/e2e-gpu-job.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:
engine:
required: true
type: string
description: "Engine to test: sglang, vllm, or trtllm"
description: "Engine to test: sglang, vllm, trtllm, or tokenspeed"
gpu_tier:
required: true
type: string
Expand Down Expand Up @@ -68,6 +68,10 @@ jobs:
if: inputs.engine == 'trtllm'
uses: ./.github/actions/setup-trtllm

- name: Setup TokenSpeed backend
if: inputs.engine == 'tokenspeed'
uses: ./.github/actions/setup-tokenspeed

# Artifact downloads
- name: Download wheel artifact
uses: actions/download-artifact@v8
Expand Down
11 changes: 11 additions & 0 deletions .github/workflows/pr-test-rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 +390,7 @@ jobs:
- 'scripts/ci_setup_python_venv.sh'
- 'scripts/ci_install_sglang.sh'
- 'scripts/ci_install_vllm.sh'
- 'scripts/ci_install_tokenspeed.sh'
- 'scripts/ci_install_e2e_deps.sh'
- 'scripts/ci_killall_sglang.sh'
- 'scripts/ci_build_wheel.sh'
Expand All @@ -404,6 +405,7 @@ jobs:
- 'e2e_test/router/**'
- 'scripts/ci_install_vllm.sh'
- 'scripts/ci_install_trtllm.sh'
- 'scripts/ci_install_tokenspeed.sh'
agentic:
- 'crates/mcp/**'
- 'crates/data_connector/**'
Expand Down Expand Up @@ -445,6 +447,10 @@ jobs:
timeout: 20
- engine: trtllm
timeout: 90
# TokenSpeed builds kernel (CUDA) + scheduler (C++/CMake) from
# source, so first run takes ~30 min; cached runs are faster.
- engine: tokenspeed
timeout: 60
Comment on lines +450 to +453
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether tokenspeed matrix lanes have a secret-based guard
rg -n -C2 'e2e-1gpu-chat:|e2e-2gpu-chat:|engine: tokenspeed|secrets\.' .github/workflows/pr-test-rust.yml
# Expect a guard that skips tokenspeed when required secret is unavailable.

Repository: lightseekorg/smg

Length of output: 890


🏁 Script executed:

sed -n '429,455p' .github/workflows/pr-test-rust.yml

Repository: lightseekorg/smg

Length of output: 980


🏁 Script executed:

sed -n '551,575p' .github/workflows/pr-test-rust.yml

Repository: lightseekorg/smg

Length of output: 886


TokenSpeed matrix lanes are not guarded by secret availability.

The tokenspeed entries at lines 452 and 567 will always schedule regardless of secret availability. The e2e-1gpu-chat job's if condition does not check for secrets.TOKENSPEED_GITHUB_TOKEN, and e2e-2gpu-chat has no if condition at all. These jobs will fail instead of being skipped when the required secret is unavailable.

Add secret-based guards to skip tokenspeed runs when TOKENSPEED_GITHUB_TOKEN is missing:

Proposed fix
   e2e-1gpu-chat:
@@
     if: >-
       always()
       && !cancelled()
       && needs.build-wheel.result == 'success'
+      && (matrix.engine != 'tokenspeed' || secrets.TOKENSPEED_GITHUB_TOKEN != '')
       && (github.event_name != 'pull_request'
           || (needs.detect-changes.result == 'success'
               && (needs.detect-changes.outputs.common == 'true'
                   || needs.detect-changes.outputs.chat-completions == 'true')))
@@
   e2e-2gpu-chat:
@@
+    if: ${{ matrix.engine != 'tokenspeed' || secrets.TOKENSPEED_GITHUB_TOKEN != '' }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/pr-test-rust.yml around lines 450 - 453, The TokenSpeed
matrix lanes are being scheduled even when the required secret is missing;
update the jobs that include the matrix entry "engine: tokenspeed" (reference
the matrix entry and job names e2e-1gpu-chat and e2e-2gpu-chat) to guard those
runs by adding an if condition checking the secret, e.g. add if: ${{
secrets.TOKENSPEED_GITHUB_TOKEN != '' }} (or if: ${{
secrets.TOKENSPEED_GITHUB_TOKEN }}) so the tokenspeed matrix lanes are skipped
when TOKENSPEED_GITHUB_TOKEN is not available.

uses: ./.github/workflows/e2e-gpu-job.yml
with:
engine: ${{ matrix.engine }}
Expand Down Expand Up @@ -555,6 +561,11 @@ jobs:
timeout: 20
- engine: trtllm
timeout: 30
# Picks up TestChatCompletionGptOss (gpt-oss-20b, ``@pytest.mark.gpu(2)``)
# on the tokenspeed engine; the 1-GPU job collected the test class but
# pytest skipped it at collection because the runner only had 1 GPU.
- engine: tokenspeed
timeout: 60
Comment thread
coderabbitai[bot] marked this conversation as resolved.
uses: ./.github/workflows/e2e-gpu-job.yml
with:
engine: ${{ matrix.engine }}
Expand Down
Loading
Loading