[doc] feat: Add Qwen3vl-8B NPU Optimization Practice by Rhetee · Pull Request #5873 · verl-project/verl

Rhetee · 2026-04-03T02:48:41Z

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

This PR updates the Qwen3vl-8B NPU Optimization Practice, developers can refer to this doc for help.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

… Model

…l-8B Model

gemini-code-assist

Code Review

This pull request introduces a comprehensive tutorial for optimizing Qwen3vl-8B GRPO training and inference on Ascend NPU platforms, covering performance profiling, operator fusion, and scheduling optimizations. The review feedback identifies several technical inaccuracies in the documentation's code and configuration snippets, including mismatched function names that would cause runtime errors, invalid YAML syntax for dynamic batch size calculations, and incomplete Python function examples lacking necessary variable definitions.

gemini-code-assist · 2026-04-03T02:50:11Z

docs/ascend_tutorial/examples/grpo_multimodal_optimization_practice.md

+modeling_qwen3_vl_moe.Qwen3VLMoeTextRMSNorm.forward = rms_norm_forward
+modeling_qwen3_vl_moe.apply_rotary_pos_emb = apply_rotary_pos_emb_qwen3_npu
+modeling_qwen3_vl.Qwen3VLTextRMSNorm.forward = rms_norm_forward
+modeling_qwen3_vl.Qwen3VLTextMLP.forward = silu_forward


文档中的代码片段引用的函数名与 verl/models/transformers/npu_patch.py 中的实际定义不一致。例如，文档中使用了 rms_norm_forward 和 apply_rotary_pos_emb_qwen3_npu，而实际代码中定义的是 rms_norm_forward_npu 和 apply_rotary_pos_emb_npu。这会导致用户在手动参考或注入逻辑时遇到 NameError。

Suggested change

modeling_qwen3_vl_moe.Qwen3VLMoeTextRMSNorm.forward = rms_norm_forward

modeling_qwen3_vl_moe.apply_rotary_pos_emb = apply_rotary_pos_emb_qwen3_npu

modeling_qwen3_vl.Qwen3VLTextRMSNorm.forward = rms_norm_forward

modeling_qwen3_vl.Qwen3VLTextMLP.forward = silu_forward

modeling_qwen3_vl_moe.Qwen3VLMoeTextRMSNorm.forward = rms_norm_forward_npu

modeling_qwen3_vl_moe.apply_rotary_pos_emb = apply_rotary_pos_emb_npu

modeling_qwen3_vl.Qwen3VLTextRMSNorm.forward = rms_norm_forward_npu

modeling_qwen3_vl.Qwen3VLTextMLP.forward = silu_forward_npu

gemini-code-assist · 2026-04-03T02:50:11Z

docs/ascend_tutorial/examples/grpo_multimodal_optimization_practice.md

+use_dynamic_bsz: true
+ppo_max_token_len_per_gpu: 2 * (max_prompt_len + max_response_len)
+log_prob_max_token_len_per_gpu: 4 * (max_prompt_len + max_response_len)


YAML 配置文件不支持直接在值中使用算术表达式（如 2 * (...)），除非使用了特定的解析器（如 OmegaConf 的自定义 resolver）。此外，use_dynamic_bsz 等配置项在当前的 verl/trainer/config/ppo_trainer.yaml 中并未定义。如果这些是新增功能，请确保配置模板同步更新；否则，请在文档中使用具体的数值示例以避免误导用户。

gemini-code-assist · 2026-04-03T02:50:11Z

docs/ascend_tutorial/examples/grpo_multimodal_optimization_practice.md

+def forward_native():
+    if is_first_layer:
+        cos_sin = self.cos_sin_cache[positions]
+        cos, sin = cos_sin.chunk(2, dim=-1)
+        if self.mrope_interleaved:
+            cos = apply_interleaved_rope(cos, self.mrope_section)
+            sin = apply_interleaved_rope(sin, self.mrope_section)
+        cos = cos.repeat(1, 2)
+        sin = sin.repeat(1, 2)
+        self.cos = cos.unsqueeze(0).unsqueeze(-2).contiguous()
+        self.sin = sin.unsqueeze(0).unsqueeze(-2).contiguous()
+        forward_context.is_first_layer = False
+
+    query_shape = query.shape
+    query = query.view(num_tokens, -1, self.head_size)
+    query_rot = query[..., :self.rotary_dim]
+    query_pass = query[..., self.rotary_dim:]
+    query_rot = query_rot.unsqueeze(0)
+    query_rot = torch_npu.npu_rotary_mul(query_rot, self.cos, self.sin, "half").squeeze(0)
+    query = torch.cat((query_rot, query_pass), dim=-1).reshape(query_shape)
+```


此 forward_native 代码片段作为技术参考不够严谨。变量 positions、query、num_tokens 和 is_first_layer 在函数作用域内均未定义或作为参数传入。建议补充完整的函数签名或必要的上下文初始化逻辑，以确保示例代码的正确性和可参考性。

Rhetee added 3 commits April 3, 2026 10:22

doc: Create NPU Performance Optimization Practices of Qwen3vl-8B Model

d7df78a

doc: Add fig for NPU Performance Optimization Practices of Qwen3vl-8B…

6719515

… Model

doc: Update time for NPU Performance Optimization Practices of Qwen3v…

8c4d079

…l-8B Model

gemini-code-assist bot reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[doc] feat: Add Qwen3vl-8B NPU Optimization Practice#5873

[doc] feat: Add Qwen3vl-8B NPU Optimization Practice#5873
Rhetee wants to merge 3 commits intoverl-project:mainfrom
Rhetee:main

Rhetee commented Apr 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 3, 2026

Uh oh!

gemini-code-assist bot Apr 3, 2026

Uh oh!

gemini-code-assist bot Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Rhetee commented Apr 3, 2026

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant