Perf[Qwen3.5]: eliminate Mamba intermediate state memcpy in MTP target-verify#159
Conversation
fix: fix a bug. opt mtp write back working slot for retract requests. opt: remove memcpy. opt: mamba snapshot kernel.
21f88a0 to
6fceb82
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6fceb82a9c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def _flush_mamba_retract_states(self, forward_op) -> None: | ||
| """Copy draft->working mamba states when retract occurred (no forward scheduled).""" | ||
| if forward_op is not None: | ||
| return | ||
| if self.model_executor.drafter is None: | ||
| return | ||
| if self.model_executor.runtime_states.mamba_pool is None: | ||
| return | ||
| self.model_executor.flush_mamba_draft_to_working_on_retract() |
There was a problem hiding this comment.
Gate retract-state flush on actual retract ops
_flush_mamba_retract_states runs whenever forward_op is None, but it never verifies that the current execution plan actually contains a retraction. In drafter+Mamba mode this means idle/no-forward iterations can repeatedly call flush_mamba_draft_to_working_on_retract() using stale previous-batch buffers, performing unintended state copies unrelated to any retract and potentially racing with cache maintenance on the same slots. Please gate this path on a real retract signal (for example, retraction-specific cache ops or a scheduler flag) instead of forward_op is None alone.
Useful? React with 👍 / 👎.
Summary
Test Plan
perf:
input=1024, output=4096, concurrency=64, num-requests=200
accuracy:

gsm8k