[doc] feat: Add Qwen3vl-8B NPU Optimization Practice#5873
[doc] feat: Add Qwen3vl-8B NPU Optimization Practice#5873Rhetee wants to merge 3 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive tutorial for optimizing Qwen3vl-8B GRPO training and inference on Ascend NPU platforms, covering performance profiling, operator fusion, and scheduling optimizations. The review feedback identifies several technical inaccuracies in the documentation's code and configuration snippets, including mismatched function names that would cause runtime errors, invalid YAML syntax for dynamic batch size calculations, and incomplete Python function examples lacking necessary variable definitions.
| modeling_qwen3_vl_moe.Qwen3VLMoeTextRMSNorm.forward = rms_norm_forward | ||
| modeling_qwen3_vl_moe.apply_rotary_pos_emb = apply_rotary_pos_emb_qwen3_npu | ||
| modeling_qwen3_vl.Qwen3VLTextRMSNorm.forward = rms_norm_forward | ||
| modeling_qwen3_vl.Qwen3VLTextMLP.forward = silu_forward |
There was a problem hiding this comment.
文档中的代码片段引用的函数名与 verl/models/transformers/npu_patch.py 中的实际定义不一致。例如,文档中使用了 rms_norm_forward 和 apply_rotary_pos_emb_qwen3_npu,而实际代码中定义的是 rms_norm_forward_npu 和 apply_rotary_pos_emb_npu。这会导致用户在手动参考或注入逻辑时遇到 NameError。
| modeling_qwen3_vl_moe.Qwen3VLMoeTextRMSNorm.forward = rms_norm_forward | |
| modeling_qwen3_vl_moe.apply_rotary_pos_emb = apply_rotary_pos_emb_qwen3_npu | |
| modeling_qwen3_vl.Qwen3VLTextRMSNorm.forward = rms_norm_forward | |
| modeling_qwen3_vl.Qwen3VLTextMLP.forward = silu_forward | |
| modeling_qwen3_vl_moe.Qwen3VLMoeTextRMSNorm.forward = rms_norm_forward_npu | |
| modeling_qwen3_vl_moe.apply_rotary_pos_emb = apply_rotary_pos_emb_npu | |
| modeling_qwen3_vl.Qwen3VLTextRMSNorm.forward = rms_norm_forward_npu | |
| modeling_qwen3_vl.Qwen3VLTextMLP.forward = silu_forward_npu |
| use_dynamic_bsz: true | ||
| ppo_max_token_len_per_gpu: 2 * (max_prompt_len + max_response_len) | ||
| log_prob_max_token_len_per_gpu: 4 * (max_prompt_len + max_response_len) |
| def forward_native(): | ||
| if is_first_layer: | ||
| cos_sin = self.cos_sin_cache[positions] | ||
| cos, sin = cos_sin.chunk(2, dim=-1) | ||
| if self.mrope_interleaved: | ||
| cos = apply_interleaved_rope(cos, self.mrope_section) | ||
| sin = apply_interleaved_rope(sin, self.mrope_section) | ||
| cos = cos.repeat(1, 2) | ||
| sin = sin.repeat(1, 2) | ||
| self.cos = cos.unsqueeze(0).unsqueeze(-2).contiguous() | ||
| self.sin = sin.unsqueeze(0).unsqueeze(-2).contiguous() | ||
| forward_context.is_first_layer = False | ||
|
|
||
| query_shape = query.shape | ||
| query = query.view(num_tokens, -1, self.head_size) | ||
| query_rot = query[..., :self.rotary_dim] | ||
| query_pass = query[..., self.rotary_dim:] | ||
| query_rot = query_rot.unsqueeze(0) | ||
| query_rot = torch_npu.npu_rotary_mul(query_rot, self.cos, self.sin, "half").squeeze(0) | ||
| query = torch.cat((query_rot, query_pass), dim=-1).reshape(query_shape) | ||
| ``` |
What does this PR do?
This PR updates the Qwen3vl-8B NPU Optimization Practice, developers can refer to this doc for help.
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.