Sample the DSpark draft proposal from q, not argmax, for B2 by audreyt · Pull Request #1 · machiabeli/ds4

audreyt · 2026-07-01T00:21:28Z

B2 rejection sampling (Chen et al. 2023 / Leviathan et al. 2023) accepts the draft token with probability min(1, p(x)/q(x)), which is only a valid lossless sampler when x is actually drawn from q, the drafter's temperature-scaled distribution. metal_graph_eval_dspark_draft_block always chose the draft token via sample_argmax regardless of DS4_SPEC_TEMP, so the accept/reject math computed p(x)/q(x) against a distribution x was never sampled from. This is the issue @lobanov's report and the codex review both flagged (§7.2 in his write-up on antirez#482).

What's in this PR

dspark_sample_draft_token: genuine temperature-scaled categorical sample over the full vocab, reusing the existing b2_log_softmax / b2_sample_from_log_probs helpers and sample_rng_f32 — no new RNG, no new translation unit, minimal diff against your branch.
Thread draft_temperature/uint64_t *draft_rng into metal_graph_eval_dspark_draft_block; use the new sampler instead of sample_argmax when B2 is active. The greedy path (temperature <= 0) is byte-for-byte unchanged — falls back to sample_argmax exactly as before.
Resolve DS4_SPEC_TEMP exactly once, at session creation, into a new s->dspark_b2_temp field, used at both draft time and accept/reject time instead of separate getenv calls at each site. RNG seeding moves into the same session-creation block, since drafting now happens before the old lazy-seed point ever ran.
dspark_sample_draft_token and b2_rejection_sample drop static so a white-box statistical test can call them directly. Not part of the public ds4.h surface — tests/ds4_test.c forward-declares the minimal shape it needs.
New CPU-only test test_dspark_b2_rejection_sampling_unbiased (--dspark-b2-unbiased, no model/GPU needed): draws a real proposal sample from a synthetic drafter distribution and feeds it through b2_rejection_sample, then checks the resulting single-token marginal matches the target distribution within tolerance — this is the actual losslessness proof, not just "sampling changed."

Test results (run this yourself: `make ds4_test && ./ds4_test --dspark-b2-unbiased`)

ds4-test: dspark-b2-unbiased max_dev_correct=0.0018 max_dev_biased=0.1426 (N=50000)

max_dev_correct=0.0018: when the proposal is genuinely sampled from q (via dspark_sample_draft_token), the sampler's output distribution matches the true target distribution to well within statistical noise (binomial std error at N=50000 is ~0.002–0.003).
max_dev_biased=0.1426: the same accept/reject code, fed an argmax'd proposal instead (the old bug), diverges by ~70x that noise floor — proving the test discriminates the exact bug this PR fixes, not just asserting something arbitrary.

Further end-to-end verification

make / make ds4_test: zero warnings.
Live --dspark-speculative-block (real model, greedy path): unaffected, worst_argmax_gap=0.000.
A/B run with two DS4_SPEC_RNG_SEED values at DS4_SPEC_TEMP=0.8, same prompt: generated text and accept/correction patterns diverge starting in cycles that were full-accept-no-correction in both seeds — a no-correction full accept commits the drafted tokens verbatim with zero residual randomness, so this divergence can only happen if the draft proposal itself is seed-dependent. It was not, before this fix (pure deterministic argmax).

Scope note

This fixes the B2 proposal-sampling precondition specifically (§7.2). It does not address the separate batch-verify-vs-single-decode floating point divergence (metal_graph_verify_suffix_tops vs sequential decode) that makes the greedy DSpark path not byte-identical to non-speculative decode — that's a different, pre-existing issue (§7.1 in lobanov's report), independent of B2, and still open.

Speed: this fix is about correctness, not throughput. On my M5 Max 128GB with the Q4K Markov drafter, B2 is still slower than baseline on most workloads after this fix — partial/correction-cycle replay remains the dominant cost, as your PR's own "Key findings" section notes. Repetitive structured output remains the one clear net win.

B2 rejection sampling (Chen et al. 2023 / Leviathan et al. 2023) accepts the draft token with probability min(1, p(x)/q(x)), which is only a valid lossless sampler when x is actually drawn from q, the drafter's temperature-scaled distribution. metal_graph_eval_dspark_draft_block always chose the draft token via sample_argmax regardless of DS4_SPEC_TEMP, so the accept/reject math computed p(x)/q(x) against a distribution x was never sampled from. - Add dspark_sample_draft_token: genuine temperature-scaled categorical sample over the full vocab (reuses the existing b2_log_softmax / b2_sample_from_log_probs helpers and sample_rng_f32, so no new RNG or translation unit). - Thread draft_temperature/uint64_t *draft_rng into metal_graph_eval_dspark_draft_block; use the new sampler instead of sample_argmax when B2 is active. Greedy path (temperature <= 0) is byte-for-byte unchanged (falls back to sample_argmax exactly as before). - Resolve DS4_SPEC_TEMP exactly once, at session creation, into a new s->dspark_b2_temp field, and use that cached value at both draft time and accept/reject time instead of querying getenv separately at each site. RNG seeding moves into the same session-creation block, since drafting now happens before the old lazy-seed point ever ran. - Drop `static` from dspark_sample_draft_token and b2_rejection_sample so a white-box statistical test can call them directly. Not part of the public ds4.h surface -- tests/ds4_test.c forward-declares the minimal shape it needs, same pattern as other internal-helper tests in this file. - Add a CPU-only synthetic test (test_dspark_b2_rejection_sampling_unbiased, --dspark-b2-unbiased) proving the sampler's single-token marginal output distribution matches a known target distribution within tolerance when the proposal is genuinely sampled (max_dev_correct=0.0018, N=50000), and diverges sharply when the proposal is naively argmax'd instead (max_dev_biased=0.1426) -- proof the test discriminates the fix from the bug it replaces, not just a passing assertion. Verified: make/make ds4_test build with zero warnings; make test group (dspark-b2-unbiased) passes with the numbers above; live --dspark-speculative-block (real model, greedy path) is unaffected, worst_argmax_gap=0.000; an A/B run with two DS4_SPEC_RNG_SEED values at DS4_SPEC_TEMP=0.8 shows different generated text and different accept/correction patterns even in cycles that were full-accept-no- correction in both seeds, proving the draft proposal itself is now seed-dependent (it was not, before this fix). Scope note: this fixes the B2 proposal-sampling precondition specifically. It does not address the separate batch-verify-vs-single-decode floating point divergence (metal_graph_verify_suffix_tops vs sequential decode) that makes the *greedy* DSpark path not byte-identical to non-speculative decode -- that is a different, pre-existing issue, independent of B2.

audreyt mentioned this pull request Jul 1, 2026

DSpark B2 rejection sampling + adaptive block sizing antirez/ds4#482

Open

audreyt force-pushed the b2-fix-for-482 branch from 9facd6c to 6a9de88 Compare July 1, 2026 00:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample the DSpark draft proposal from q, not argmax, for B2#1

Sample the DSpark draft proposal from q, not argmax, for B2#1
audreyt wants to merge 1 commit into
machiabeli:work-dsparkfrom
audreyt:b2-fix-for-482

audreyt commented Jul 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

audreyt commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's in this PR

Test results (run this yourself: make ds4_test && ./ds4_test --dspark-b2-unbiased)

Further end-to-end verification

Scope note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

audreyt commented Jul 1, 2026 •

edited

Loading

Test results (run this yourself: `make ds4_test && ./ds4_test --dspark-b2-unbiased`)