-
Notifications
You must be signed in to change notification settings - Fork 660
[Feature] Support noaux for eplb #5143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support noaux for eplb #5143
Conversation
|
Thanks for your contribution! |
jeff41404
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM。新增noaux_tc_redundant算子,由get_moe_scores调用,单测中包括get_moe_scores的组合算子实现版本
gongshaotian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for the noaux_tc operation with redundant expert management in the EPLB (Expert Parallel Load Balancing) system. The main purpose is to enable load statistics and loading for expert selection when using the noaux topk routing method with redundant experts.
Key changes:
- Implements
noaux_tc_redundantkernel and operators for redundant expert selection - Extends MoE routing to support redundant expert arrays with load balancing
- Adds test coverage for noaux group topk functionality with redundant experts
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/operators/test_noaux_tc_redundant.py | New test file validating noaux group topk with redundant expert routing |
| fastdeploy/model_executor/layers/moe/moe.py | Extends get_moe_scores to conditionally use noaux_tc_redundant for expert selection |
| fastdeploy/model_executor/layers/moe/ep.py | Updates moe_select to route through noaux_tc when redundant experts are enabled |
| fastdeploy/model_executor/models/ernie4_5_moe.py | Fixes incorrect attribute reference in update_state_dict |
| custom_ops/gpu_ops/noauxtc_kernel.h | Adds group_idx_and_topk_idx_redundant_kernel implementation and invokeNoAuxTcRedundant function |
| custom_ops/gpu_ops/noaux_tc_redundant.cu | New operator definition for noaux_tc_redundant with Paddle integration |
| custom_ops/gpu_ops/cpp_extensions.cc | Registers NoauxTcRedundant function in Python bindings |
| custom_ops/setup_ops.py | Adds noaux_tc_redundant.cu to build configuration |
custom_ops/gpu_ops/cpp_extensions.cc
Outdated
|
|
||
| m.def("noaux_tc", &NoauxTc, "noaux_tc for Deepseekv3 MoE compute"); | ||
|
|
||
| m.def("noaux_tc_redunant", |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'noaux_tc_redunant' to 'noaux_tc_redundant'. The function name has a typo - missing the second 'd' in 'redundant'.
| m.def("noaux_tc_redunant", | |
| m.def("noaux_tc_redundant", |
| routed_scaling_factor, | ||
| redundant_ep_rank_num_plus_one); | ||
| #else | ||
| auto* kernel_instance2 = &group_idx_and_topk_idx_kernel<T, IdxT>; |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong kernel function pointer assigned. This should be &group_idx_and_topk_idx_redundant_kernel<T, IdxT> since this is in the invokeNoAuxTcRedundant function which should call the redundant version of the kernel.
| auto* kernel_instance2 = &group_idx_and_topk_idx_kernel<T, IdxT>; | |
| auto* kernel_instance2 = &group_idx_and_topk_idx_redundant_kernel<T, IdxT>; |
| // See the License for the specific language governing permissions and | ||
| // limitations under the License. | ||
|
|
||
| #pragma once |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The #pragma once directive is intended for header files (.h/.hpp), not source files (.cu). This should be removed from the .cu file as it has no effect and may cause confusion.
| #pragma once |
106ebec
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5143 +/- ##
==========================================
Coverage ? 57.78%
==========================================
Files ? 316
Lines ? 38233
Branches ? 5715
==========================================
Hits ? 22094
Misses ? 14382
Partials ? 1757
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This reverts commit 6ca2651.
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.