test(qwen3): cover TP=2 in the hf golden gate by FeathBow · Pull Request #322 · openinfer-project/openinfer

FeathBow · 2026-06-10T01:07:18Z

Description

Adds TP=2 coverage to the Qwen3-4B HF golden gate. The gate previously only ever ran device_ordinals: vec![0]. The eager replay suite (bs=1 sequential with the determinism rerun, batched isolation, both prefix-cached replays) is extracted into run_eager_suite(golden, model_path, devices, label) and rerun under TP=2 with the same golden and tolerances. No new golden files. The pass lives in the existing #[test], so it runs in the normal local test flow on any machine with >= 2 GPUs and skips automatically below that. TP=8 is the follow-on already named in the issue.

Test Env

Dual-GH200 (aarch64, sm_90) node: full gate green, 7 single-GPU + 4 TP=2 passes in one process. TP=2 sits at the single-GPU noise floor (mean ~0.031, p99 ~0.12-0.13).
Single GPU (x86_64, sm_89): existing passes green and unchanged, TP pass skips with skipping hf_golden_gate TP=2 pass: 1 CUDA device(s) visible, need 2.

Full gate output on the dual-GH200 (single-GPU + TP=2, one process)

hf_golden_gate [sequential bs=1 eager]: 816 positions, 6528 head deltas — mean 0.0318 p50 0.0242 p99 0.1173 max 0.3124
hf_golden_gate [sequential bs=1 eager]: worst head delta 0.3124 @ seq 7 pos 5 token 68172 (pega -9.9384, HF -10.2508)
hf_golden_gate [batched eager (9, no pad)]: 153 positions, 1224 head deltas — mean 0.0325 p50 0.0240 p99 0.1558 max 0.3124
hf_golden_gate [batched eager (9, no pad)]: worst head delta 0.3124 @ seq 7 pos 5 token 239 (pega -9.0635, HF -9.3758)
hf_golden_gate [sequential bs=1 eager cached replay]: 816 positions, 6528 head deltas — mean 0.0313 p50 0.0253 p99 0.1168 max 0.3124
hf_golden_gate [sequential bs=1 eager cached replay]: worst head delta 0.3124 @ seq 7 pos 5 token 68172 (pega -9.9384, HF -10.2508)
hf_golden_gate [batched eager cached replay (9)]: 153 positions, 1224 head deltas — mean 0.0313 p50 0.0248 p99 0.1204 max 0.3124
hf_golden_gate [batched eager cached replay (9)]: worst head delta 0.3124 @ seq 7 pos 5 token 68172 (pega -9.9384, HF -10.2508)
hf_golden_gate [batched cuda-graph (9 padded)]: 153 positions, 1224 head deltas — mean 0.0325 p50 0.0240 p99 0.1558 max 0.3124
hf_golden_gate [batched cuda-graph (9 padded)]: worst head delta 0.3124 @ seq 7 pos 5 token 239 (pega -9.0635, HF -9.3758)
hf_golden_gate [batched cuda-graph (5 padded)]: 85 positions, 680 head deltas — mean 0.0311 p50 0.0248 p99 0.1271 max 0.1928
hf_golden_gate [batched cuda-graph (5 padded)]: worst head delta 0.1928 @ seq 3 pos 9 token 398 (pega -5.5985, HF -5.4057)
hf_golden_gate [batched cuda-graph cached replay (5)]: 85 positions, 680 head deltas — mean 0.0309 p50 0.0262 p99 0.1260 max 0.1816
hf_golden_gate [batched cuda-graph cached replay (5)]: worst head delta 0.1816 @ seq 4 pos 9 token 47534 (pega -5.2187, HF -5.4003)
hf_golden_gate [tp2 sequential bs=1 eager]: 816 positions, 6528 head deltas — mean 0.0311 p50 0.0245 p99 0.1189 max 0.2741
hf_golden_gate [tp2 sequential bs=1 eager]: worst head delta 0.2741 @ seq 14 pos 7 token 291 (pega -4.8874, HF -4.6133)
hf_golden_gate [tp2 batched eager (9, no pad)]: 153 positions, 1224 head deltas — mean 0.0302 p50 0.0221 p99 0.1280 max 0.3124
hf_golden_gate [tp2 batched eager (9, no pad)]: worst head delta 0.3124 @ seq 7 pos 5 token 68172 (pega -9.9384, HF -10.2508)
hf_golden_gate [tp2 sequential bs=1 eager cached replay]: 816 positions, 6528 head deltas — mean 0.0313 p50 0.0246 p99 0.1175 max 0.2741
hf_golden_gate [tp2 sequential bs=1 eager cached replay]: worst head delta 0.2741 @ seq 14 pos 7 token 291 (pega -4.8874, HF -4.6133)
hf_golden_gate [tp2 batched eager cached replay (9)]: 153 positions, 1224 head deltas — mean 0.0309 p50 0.0250 p99 0.1234 max 0.2034
hf_golden_gate [tp2 batched eager cached replay (9)]: worst head delta 0.2034 @ seq 4 pos 1 token 59941 (pega -2.3543, HF -2.5578)
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 33.68s

Type of Change

New feature (non-breaking change which adds functionality)

Checklist

My code follows the style guidelines of this project (see docs/conventions/coding-style.md).
I have performed a self-review of my own code.
I have formatted my commits according to Commitizen conventions.
I have run the local test suite and all tests pass (see CLAUDE.md).

xiaguan

LGTM

test(qwen3): cover TP=2 in the hf golden gate (openinfer-project#245)

eb425d1

xiaguan approved these changes Jun 10, 2026

View reviewed changes

xiaguan merged commit 7c68f77 into openinfer-project:main Jun 10, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(qwen3): cover TP=2 in the hf golden gate#322

test(qwen3): cover TP=2 in the hf golden gate#322
xiaguan merged 1 commit into
openinfer-project:mainfrom
FeathBow:test/245-qwen3-tp2-golden-gate

FeathBow commented Jun 10, 2026

Uh oh!

xiaguan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FeathBow commented Jun 10, 2026

Description

Test Env

Type of Change

Checklist

Uh oh!

xiaguan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants