[Feature] Support noaux for eplb #5143

xiaoxiaohehe001 · 2025-11-20T09:02:13Z

Motivation

Modifications

支持 noaux topk 下的 eplb 负载统计和加载

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-20T09:02:19Z

Thanks for your contribution!

jeff41404

LGTM。新增noaux_tc_redundant算子，由get_moe_scores调用，单测中包括get_moe_scores的组合算子实现版本

gongshaotian

LGTM

Copilot

Pull Request Overview

This PR adds support for the noaux_tc operation with redundant expert management in the EPLB (Expert Parallel Load Balancing) system. The main purpose is to enable load statistics and loading for expert selection when using the noaux topk routing method with redundant experts.

Key changes:

Implements noaux_tc_redundant kernel and operators for redundant expert selection
Extends MoE routing to support redundant expert arrays with load balancing
Adds test coverage for noaux group topk functionality with redundant experts

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/operators/test_noaux_tc_redundant.py	New test file validating noaux group topk with redundant expert routing
fastdeploy/model_executor/layers/moe/moe.py	Extends get_moe_scores to conditionally use noaux_tc_redundant for expert selection
fastdeploy/model_executor/layers/moe/ep.py	Updates moe_select to route through noaux_tc when redundant experts are enabled
fastdeploy/model_executor/models/ernie4_5_moe.py	Fixes incorrect attribute reference in update_state_dict
custom_ops/gpu_ops/noauxtc_kernel.h	Adds group_idx_and_topk_idx_redundant_kernel implementation and invokeNoAuxTcRedundant function
custom_ops/gpu_ops/noaux_tc_redundant.cu	New operator definition for noaux_tc_redundant with Paddle integration
custom_ops/gpu_ops/cpp_extensions.cc	Registers NoauxTcRedundant function in Python bindings
custom_ops/setup_ops.py	Adds noaux_tc_redundant.cu to build configuration

Copilot · 2025-11-21T03:54:48Z

custom_ops/gpu_ops/cpp_extensions.cc


  m.def("noaux_tc", &NoauxTc, "noaux_tc for Deepseekv3 MoE compute");

+  m.def("noaux_tc_redunant",


Corrected spelling of 'noaux_tc_redunant' to 'noaux_tc_redundant'. The function name has a typo - missing the second 'd' in 'redundant'.

Suggested change

m.def("noaux_tc_redunant",

m.def("noaux_tc_redundant",

Copilot · 2025-11-21T03:54:49Z

custom_ops/gpu_ops/noauxtc_kernel.h

+                   routed_scaling_factor,
+                   redundant_ep_rank_num_plus_one);
+#else
+  auto* kernel_instance2 = &group_idx_and_topk_idx_kernel<T, IdxT>;


Wrong kernel function pointer assigned. This should be &group_idx_and_topk_idx_redundant_kernel<T, IdxT> since this is in the invokeNoAuxTcRedundant function which should call the redundant version of the kernel.

Suggested change

auto* kernel_instance2 = &group_idx_and_topk_idx_kernel<T, IdxT>;

auto* kernel_instance2 = &group_idx_and_topk_idx_redundant_kernel<T, IdxT>;

custom_ops/gpu_ops/noaux_tc_redundant.cu

Copilot · 2025-11-21T03:54:49Z

custom_ops/gpu_ops/noaux_tc_redundant.cu

+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once


The #pragma once directive is intended for header files (.h/.hpp), not source files (.cu). This should be removed from the .cu file as it has no effect and may cause confusion.

Suggested change

#pragma once

codecov-commenter · 2025-11-21T05:11:32Z

Codecov Report

❌ Patch coverage is 45.45455% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@6fa3410). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/model_executor/layers/moe/ep.py	0.00%	4 Missing ⚠️
fastdeploy/model_executor/layers/moe/moe.py	83.33%	0 Missing and 1 partial ⚠️
fastdeploy/model_executor/models/ernie4_5_moe.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5143   +/-   ##
==========================================
  Coverage           ?   57.78%           
==========================================
  Files              ?      316           
  Lines              ?    38233           
  Branches           ?     5715           
==========================================
  Hits               ?    22094           
  Misses             ?    14382           
  Partials           ?     1757

Flag	Coverage Δ
diff	`57.78% <45.45%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This reverts commit 6ca2651.

support noaux eplb

3679d76

xiaoxiaohehe001 added 2 commits November 20, 2025 17:41

noaux_eplb

0b9f82b

noaux_eplb

3c17044

DDDivano previously approved these changes Nov 21, 2025

View reviewed changes

xiaoxiaohehe001 requested a review from gongshaotian November 21, 2025 03:27

jeff41404 approved these changes Nov 21, 2025

View reviewed changes

gongshaotian previously approved these changes Nov 21, 2025

View reviewed changes

Jiang-Jia-Jun requested a review from Copilot November 21, 2025 03:51

Copilot started reviewing on behalf of Jiang-Jia-Jun November 21, 2025 03:52 View session

Copilot finished reviewing on behalf of Jiang-Jia-Jun November 21, 2025 03:54

Copilot AI reviewed Nov 21, 2025

View reviewed changes

noaux_eplb

106ebec

xiaoxiaohehe001 dismissed stale reviews from gongshaotian and DDDivano via 106ebec November 21, 2025 04:04

Jiang-Jia-Jun added the skip-ci: coverage label Nov 21, 2025

Jiang-Jia-Jun merged commit 6ca2651 into PaddlePaddle:develop Nov 21, 2025
14 of 17 checks passed

EmmonsCurse added a commit that referenced this pull request Nov 21, 2025

Revert "[Feature] Support noaux for eplb (#5143)"

a016275

This reverts commit 6ca2651.

EmmonsCurse mentioned this pull request Nov 21, 2025

Revert "[Feature] Support noaux for eplb" #5160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Support noaux for eplb #5143

[Feature] Support noaux for eplb #5143

Uh oh!

xiaoxiaohehe001 commented Nov 20, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Nov 20, 2025

Uh oh!

jeff41404 left a comment

Uh oh!

gongshaotian left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

codecov-commenter commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants


		m.def("noaux_tc", &NoauxTc, "noaux_tc for Deepseekv3 MoE compute");

		m.def("noaux_tc_redunant",

	auto* kernel_instance2 = &group_idx_and_topk_idx_kernel<T, IdxT>;
	auto* kernel_instance2 = &group_idx_and_topk_idx_redundant_kernel<T, IdxT>;

[Feature] Support noaux for eplb #5143

[Feature] Support noaux for eplb #5143

Uh oh!

Conversation

xiaoxiaohehe001 commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 20, 2025

Uh oh!

jeff41404 left a comment

Choose a reason for hiding this comment

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Nov 21, 2025

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

xiaoxiaohehe001 commented Nov 20, 2025 •

edited

Loading