Add LWS Labels Extension Plugin for ModelServing Pods#767
Add LWS Labels Extension Plugin for ModelServing Pods#767WHOIM1205 wants to merge 1 commit intovolcano-sh:mainfrom
Conversation
Signed-off-by: WHOIM1205 <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @WHOIM1205, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the ModelServing controller's compatibility with the Kubernetes LeaderWorkerSet (LWS) ecosystem. It introduces a new built-in plugin that automatically injects the standard LWS-specific labels into pods created by the ModelServing controller. This ensures that LWS-managed pods are correctly identified and grouped, facilitating integration with LWS-aware tools and controllers, without altering existing controller logic or reconciliation. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
/assign @YaoZengzeng |
There was a problem hiding this comment.
Code Review
This pull request introduces a new built-in plugin, lws-labels, to inject standard LeaderWorkerSet (LWS) labels into pods managed by the ModelServing controller. A security analysis of pkg/model-serving-controller/plugins/lws_labels_plugin.go found no vulnerabilities. The implementation is clean and well-tested, though there is a suggestion to enhance test assertion robustness. Overall, this is a great enhancement for improving compatibility with the LWS ecosystem.
| for key, want := range tt.expectLabels { | ||
| got, ok := pod.Labels[key] | ||
| if !ok { | ||
| t.Errorf("label %s missing", key) | ||
| } else if got != want { | ||
| t.Errorf("label %s = %q, want %q", key, got, want) | ||
| } | ||
| } |
There was a problem hiding this comment.
For more robust test assertions, it's better to compare the entire labels map using reflect.DeepEqual. This ensures that no unexpected labels are added and that the final state of the labels is exactly as expected. The current loop-based check only verifies that a subset of expected labels exist, but it wouldn't catch any extra, erroneously added labels.
Note: You will need to import the reflect package to use this function.
if !reflect.DeepEqual(pod.Labels, tt.expectLabels) {
t.Errorf("labels mismatch.\nGot: %v\nWant: %v", pod.Labels, tt.expectLabels)
}There was a problem hiding this comment.
Pull request overview
This PR introduces a built-in ModelServing extension plugin intended to inject standard LeaderWorkerSet (LWS) labels onto pods created by the ModelServing controller, improving compatibility with the LWS ecosystem.
Changes:
- Added
LWSLabelsPluginto injectleaderworkerset.sigs.k8s.io/*labels duringOnPodCreate. - Added unit tests covering entry/worker pods, nil-safety, and plugin registration.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pkg/model-serving-controller/plugins/lws_labels_plugin.go | New built-in plugin that derives and injects 4 standard LWS labels during pod creation. |
| pkg/model-serving-controller/plugins/lws_labels_plugin_test.go | Table-driven tests for label injection behavior and DefaultRegistry registration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func init() { | ||
| DefaultRegistry.Register(LWSLabelsPluginName, NewLWSLabelsPlugin) | ||
| } | ||
|
|
||
| // NewLWSLabelsPlugin constructs the LWS labels plugin from a PluginSpec. | ||
| // This plugin does not require any configuration. | ||
| func NewLWSLabelsPlugin(spec workloadv1alpha1.PluginSpec) (Plugin, error) { | ||
| return &LWSLabelsPlugin{name: spec.Name}, nil | ||
| } |
There was a problem hiding this comment.
As implemented, this plugin only runs when it is explicitly listed in ModelServing.spec.plugins. The current LWS->ModelServing translation (pkg/model-serving-controller/controller/lws_controller.go:constructModelServing) does not add this plugin, so LWS-originated pods will still miss the leaderworkerset.sigs.k8s.io/* labels by default. Consider wiring this in (e.g., have the LWS controller inject a built-in PluginSpec{Name: "lws-labels", Type: BuiltIn} or add defaulting based on the LWS ownerRef), otherwise the PR won't actually fix #759 as described.
| func (p *LWSLabelsPlugin) OnPodCreate(_ context.Context, req *HookRequest) error { | ||
| if req == nil || req.Pod == nil || req.ModelServing == nil { | ||
| return nil | ||
| } | ||
|
|
||
| // Derive label values from the HookRequest context. | ||
| lwsName := req.ModelServing.Name | ||
|
|
||
| // Extract group index from the serving group name (e.g. "my-lws-0" → "0"). | ||
| _, groupIndex := utils.GetParentNameAndOrdinal(req.ServingGroup) | ||
| if groupIndex < 0 { | ||
| return fmt.Errorf("cannot extract group index from serving group name %q", req.ServingGroup) | ||
| } | ||
| groupIndexStr := strconv.Itoa(groupIndex) | ||
|
|
||
| // Extract worker index from the pod name (trailing ordinal, e.g. "my-lws-0-default-0-1" → "1"). | ||
| _, workerIndex := utils.GetParentNameAndOrdinal(req.Pod.Name) | ||
| if workerIndex < 0 { | ||
| return fmt.Errorf("cannot extract worker index from pod name %q", req.Pod.Name) | ||
| } |
There was a problem hiding this comment.
OnPodCreate returns an error if it cannot parse req.ServingGroup or req.Pod.Name. Because plugin errors abort pod creation (createPod returns the error), this makes a best-effort label-injection feature capable of breaking workloads. Prefer treating parsing failures as a no-op (or gating on a known LWS marker/ownerRef) and only injecting labels when the expected naming pattern is present.
| workerIndexStr := strconv.Itoa(workerIndex) | ||
|
|
||
| // Group key uniquely identifies the group within the LWS. | ||
| groupKey := fmt.Sprintf("%s-%s", lwsName, groupIndexStr) |
There was a problem hiding this comment.
groupKey is recomputed from ModelServing.Name + parsed group index, but the serving group name is already available as req.ServingGroup (and is what pods/services are keyed off elsewhere). Using req.ServingGroup directly would be more robust (e.g., if the ModelServing name ever diverges from the serving group prefix) and avoids constructing a potentially inconsistent key.
| groupKey := fmt.Sprintf("%s-%s", lwsName, groupIndexStr) | |
| groupKey := req.ServingGroup |
| LWSLabelName: "user-override", // preserved | ||
| LWSLabelGroupIndex: "99", // preserved | ||
| LWSLabelWorkerIndex: "0", // injected | ||
| LWSLabelGroupKey: "my-lws-0", // injected |
There was a problem hiding this comment.
This test case asserts that group-index can be user-overridden ("99") while the plugin still injects group-key based on the parsed serving group name ("my-lws-0"). That produces an internally inconsistent label set (group-key no longer corresponds to group-index), which can break LWS tooling. Either (a) derive group-key from the effective group-index value when present, or (b) skip injecting group-key when group-index is already set to something else, and adjust the expectation accordingly.
| LWSLabelGroupKey: "my-lws-0", // injected |
|
/assign @YaoZengzeng |
@hzxuzhonghu is there anything i can change in this pr |
What type of PR is this?
/kind enhancement
What this PR does / why we need it:
Pods created by the ModelServing controller currently only include kthena/volcano-specific labels.
When the workload originates from a LeaderWorkerSet (LWS), the standard
leaderworkerset.sigs.k8s.io/*labels are missing, which breaks compatibility with the LWS ecosystem (controllers, monitoring, logging).This PR adds a built-in extension plugin that injects the standard LWS labels during pod creation, ensuring LWS-managed pods can be correctly identified and grouped while remaining fully backward compatible.
Which issue(s) this PR fixes:
Fixes #759
Special notes for your reviewer:
OnPodCreate, after pod generation and before the KubernetesCreate()call