Skip to content

Conversation

@xiaoxiaohehe001
Copy link
Collaborator

@xiaoxiaohehe001 xiaoxiaohehe001 commented Nov 20, 2025

Motivation

Modifications

  • 支持 noaux topk 下的 eplb 负载统计和加载

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Nov 20, 2025

Thanks for your contribution!

DDDivano
DDDivano previously approved these changes Nov 21, 2025
Copy link

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM。新增noaux_tc_redundant算子,由get_moe_scores调用,单测中包括get_moe_scores的组合算子实现版本

gongshaotian
gongshaotian previously approved these changes Nov 21, 2025
Copy link
Collaborator

@gongshaotian gongshaotian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the noaux_tc operation with redundant expert management in the EPLB (Expert Parallel Load Balancing) system. The main purpose is to enable load statistics and loading for expert selection when using the noaux topk routing method with redundant experts.

Key changes:

  • Implements noaux_tc_redundant kernel and operators for redundant expert selection
  • Extends MoE routing to support redundant expert arrays with load balancing
  • Adds test coverage for noaux group topk functionality with redundant experts

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/operators/test_noaux_tc_redundant.py New test file validating noaux group topk with redundant expert routing
fastdeploy/model_executor/layers/moe/moe.py Extends get_moe_scores to conditionally use noaux_tc_redundant for expert selection
fastdeploy/model_executor/layers/moe/ep.py Updates moe_select to route through noaux_tc when redundant experts are enabled
fastdeploy/model_executor/models/ernie4_5_moe.py Fixes incorrect attribute reference in update_state_dict
custom_ops/gpu_ops/noauxtc_kernel.h Adds group_idx_and_topk_idx_redundant_kernel implementation and invokeNoAuxTcRedundant function
custom_ops/gpu_ops/noaux_tc_redundant.cu New operator definition for noaux_tc_redundant with Paddle integration
custom_ops/gpu_ops/cpp_extensions.cc Registers NoauxTcRedundant function in Python bindings
custom_ops/setup_ops.py Adds noaux_tc_redundant.cu to build configuration


m.def("noaux_tc", &NoauxTc, "noaux_tc for Deepseekv3 MoE compute");

m.def("noaux_tc_redunant",
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'noaux_tc_redunant' to 'noaux_tc_redundant'. The function name has a typo - missing the second 'd' in 'redundant'.

Suggested change
m.def("noaux_tc_redunant",
m.def("noaux_tc_redundant",

Copilot uses AI. Check for mistakes.
routed_scaling_factor,
redundant_ep_rank_num_plus_one);
#else
auto* kernel_instance2 = &group_idx_and_topk_idx_kernel<T, IdxT>;
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong kernel function pointer assigned. This should be &group_idx_and_topk_idx_redundant_kernel<T, IdxT> since this is in the invokeNoAuxTcRedundant function which should call the redundant version of the kernel.

Suggested change
auto* kernel_instance2 = &group_idx_and_topk_idx_kernel<T, IdxT>;
auto* kernel_instance2 = &group_idx_and_topk_idx_redundant_kernel<T, IdxT>;

Copilot uses AI. Check for mistakes.
// See the License for the specific language governing permissions and
// limitations under the License.

#pragma once
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #pragma once directive is intended for header files (.h/.hpp), not source files (.cu). This should be removed from the .cu file as it has no effect and may cause confusion.

Suggested change
#pragma once

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 45.45455% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@6fa3410). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/model_executor/layers/moe/ep.py 0.00% 4 Missing ⚠️
fastdeploy/model_executor/layers/moe/moe.py 83.33% 0 Missing and 1 partial ⚠️
fastdeploy/model_executor/models/ernie4_5_moe.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5143   +/-   ##
==========================================
  Coverage           ?   57.78%           
==========================================
  Files              ?      316           
  Lines              ?    38233           
  Branches           ?     5715           
==========================================
  Hits               ?    22094           
  Misses             ?    14382           
  Partials           ?     1757           
Flag Coverage Δ
diff 57.78% <45.45%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 6ca2651 into PaddlePaddle:develop Nov 21, 2025
14 of 17 checks passed
EmmonsCurse added a commit that referenced this pull request Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants