Skip to content

Perf[Qwen3.5]: eliminate Mamba intermediate state memcpy in MTP target-verify#159

Merged
lightseek-bot merged 1 commit into
mainfrom
zt/dev/mamba_update_opt
May 19, 2026
Merged

Perf[Qwen3.5]: eliminate Mamba intermediate state memcpy in MTP target-verify#159
lightseek-bot merged 1 commit into
mainfrom
zt/dev/mamba_update_opt

Conversation

@tuanzhangCS
Copy link
Copy Markdown
Contributor

@tuanzhangCS tuanzhangCS commented May 15, 2026

Summary

  • Replace the intermediate state buffer + post-verify scatter approach with inline output-state-indices: during MTP target-verify, conv1d and SSM kernels now write states directly to per-draft-token slots, eliminating the large intermediate buffers (intermediate_ssm_state_cache, intermediate_conv_window_cache) and the fused_mamba_state_scatter_with_mask memcpy step after verification.
  • Introduce current_input_indices bookkeeping in SimpleMambaPool to track which cache slot each request should read from at the start of target-verify, with proper reset on extend/retract to stay in sync with the C++ scheduler.
  • Reduce GPU memory footprint by removing per-layer (pool_size, draft_tokens, state_shape) intermediate buffers, replaced by a flat draft-slot region appended to the existing conv/SSM state tensors.

Test Plan

perf:
input=1024, output=4096, concurrency=64, num-requests=200

before optimize
output tokens/s 4143.97 4301.36
nsys image image

accuracy:
gsm8k
image

@tuanzhangCS tuanzhangCS changed the title Perf: eliminate Mamba intermediate state memcpy in MTP target-verify Perf[Qwen3.5]: eliminate Mamba intermediate state memcpy in MTP target-verify May 15, 2026
fix: fix a bug.

opt mtp

write back working slot for retract requests.

opt: remove memcpy.

opt: mamba snapshot kernel.
@tuanzhangCS tuanzhangCS force-pushed the zt/dev/mamba_update_opt branch from 21f88a0 to 6fceb82 Compare May 18, 2026 08:27
@tuanzhangCS tuanzhangCS marked this pull request as ready for review May 18, 2026 09:37
@tuanzhangCS tuanzhangCS requested a review from a team as a code owner May 18, 2026 09:37
@zhyncs
Copy link
Copy Markdown
Member

zhyncs commented May 18, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6fceb82a9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +643 to +651
def _flush_mamba_retract_states(self, forward_op) -> None:
"""Copy draft->working mamba states when retract occurred (no forward scheduled)."""
if forward_op is not None:
return
if self.model_executor.drafter is None:
return
if self.model_executor.runtime_states.mamba_pool is None:
return
self.model_executor.flush_mamba_draft_to_working_on_retract()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate retract-state flush on actual retract ops

_flush_mamba_retract_states runs whenever forward_op is None, but it never verifies that the current execution plan actually contains a retraction. In drafter+Mamba mode this means idle/no-forward iterations can repeatedly call flush_mamba_draft_to_working_on_retract() using stale previous-batch buffers, performing unintended state copies unrelated to any retract and potentially racing with cache maintenance on the same slots. Please gate this path on a real retract signal (for example, retraction-specific cache ops or a scheduler flag) instead of forward_op is None alone.

Useful? React with 👍 / 👎.

@lightseek-bot lightseek-bot merged commit 7713bb9 into main May 19, 2026
81 of 87 checks passed
@lightseek-bot lightseek-bot deleted the zt/dev/mamba_update_opt branch May 19, 2026 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants