Skip to content

[doc,model] feat: Add Qwen3-235B NPU Long Sequence Optimizing Practice#5835

Open
Vvictorrrr wants to merge 18 commits intoverl-project:mainfrom
Vvictorrrr:main
Open

[doc,model] feat: Add Qwen3-235B NPU Long Sequence Optimizing Practice#5835
Vvictorrrr wants to merge 18 commits intoverl-project:mainfrom
Vvictorrrr:main

Conversation

@Vvictorrrr
Copy link
Copy Markdown
Contributor

@Vvictorrrr Vvictorrrr commented Apr 1, 2026

What does this PR do?

This PR updates the Qwen3-235B NPU Long Sequence Optimizing Practice, developers can refer to this doc for help.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.
Note related.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.
Note related.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.
Not related

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

…g Sequence Reinforcement Learning.md

update optimizing doc
@Vvictorrrr Vvictorrrr requested a review from FightingZhen as a code owner April 1, 2026 03:43
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive tutorial for optimizing the training performance of the Qwen3-235B model on Ascend NPU platforms, focusing on long-sequence reinforcement learning. The document covers performance bottleneck analysis and provides specific optimization strategies for inference and training. The review feedback identifies several minor issues that require correction, including typos in the 'torch-npu' component name, inconsistent section numbering, and a stray character in a configuration snippet.

Vvictorrrr and others added 4 commits April 2, 2026 19:29
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Vvictorrrr Vvictorrrr changed the title [doc] Add Qwen3-235B NPU Long Sequence Optimizing Practice [doc,model] feat: Add Qwen3-235B NPU Long Sequence Optimizing Practice Apr 2, 2026
@@ -0,0 +1,171 @@

## 一、背景概述

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index.rst中加入这两个文档

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已添加


## 一、背景概述

随着大模型后训练范式从SFT向SFT-RL-SFT演进,强化学习在大模型对齐与能力提升中扮演关键角色。基于昇腾NPU平台的Verl框架已成为主流训练工具之一,尤其在长序列推理场景下对性能与显存效率提出更高要求。
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

缺少update time

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@@ -0,0 +1,154 @@

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文件名有点过于长了

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

…in Asynchronous Training Scenarios to NPU Performance Optimization Practices of Qwen3-30B-A3B Model
…to NPU Performance Optimization Practices of Qwen3-30B-A3B Model.md
@@ -0,0 +1,171 @@
文档更新时间:2025.11
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last updated: 03/26/2026. 参考这种形式


## 背景概述

随着大模型规模持续增长,推理与训练的性能瓶颈日益突出,尤其在MoE架构下,通信开销、算子效率与显存管理成为制约系统吞吐的关键因素。
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last updated: 03/26/2026. 参考这种形式 这里也缺少

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

wucong25
wucong25 previously approved these changes Apr 16, 2026

## 版本环境

- vLLM-Ascend: v0.11.0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前verl里vllm_ascend版本都是0.13.0,这里vLLM-Ascend: v0.11。0

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是之前做的优化了

| torch | 2.7.1 |
| torch-npu | 2.7.1-0919 |

MindSpeed-RL 2.2.0商发配套版本:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么会有MindSpeed-RL 2.2.0

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@wucong25 wucong25 self-requested a review April 16, 2026 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants