JointInferenceService controller enhancement #445

SherlockShemol · 2024-09-29T04:46:39Z

What type of PR is this?

What this PR does / why we need it:

create deployment resources to enable pod to rebuild after manual deletion.
enable JointInferenceService Controller to update pod when modifying CRD.
add test file to test files to ensure functionality stability and correctness.
Which issue(s) this PR fixes:

Fixes #

MooreZheng

There are CI issues that remain to be resolved, see https://github.com/kubeedge/sedna/actions/runs/11249926500/job/31277758880?pr=445

tangming1996 · 2024-10-28T02:22:03Z

/lgtm

tangming1996 · 2024-10-29T07:39:47Z

pkg/globalmanager/controllers/jointinference/jointinferenceservice.go

+	// Check the changes in specific fields and perform corresponding operations.
+	if !reflect.DeepEqual(oldService.Spec.CloudWorker, newService.Spec.CloudWorker) {
+		// If the cloud inference service changes, perform the corresponding update operation.
+		return c.updateCloudWorker(newService)


Can't return directly; otherwise, the subsequent code won't be executed directly.

Modify the updateInferenceServices function as follows.

func (c *Controller) updateInferenceServices(old, cur interface{}) error { oldService := old.(*sednav1.JointInferenceService) newService := cur.(*sednav1.JointInferenceService) // Check if the cloud worker configuration has changed if !reflect.DeepEqual(oldService.Spec.CloudWorker, newService.Spec.CloudWorker) { // Update cloud worker and log any errors if err := c.updateCloudWorker(newService); err != nil { klog.Errorf("Failed to update cloud worker for service %s/%s: %v", newService.Namespace, newService.Name, err) } } // Retrieve the address of the cloud inference service var bigModelHost string svc, err := c.kubeClient.CoreV1().Services(oldService.Namespace).Get(context.Background(), strings.ToLower(oldService.Name+"-"+jointInferenceForCloud), metav1.GetOptions{}) if err != nil { if errors.IsNotFound(err) { // If the service does not exist, create a new one and retrieve its address klog.Info("Cloud service not found, creating new service...") bigModelHost, err = runtime.CreateEdgeMeshService(c.kubeClient, oldService, jointInferenceForCloud, BigModelPort) if err != nil { klog.Errorf("Failed to create EdgeMesh service for service %s/%s: %v", oldService.Namespace, oldService.Name, err) } } else { klog.Errorf("Failed to get cloud service %s/%s: %v", oldService.Namespace, oldService.Name, err) } } else { bigModelHost = fmt.Sprintf("%s.%s", svc.Name, svc.Namespace) } // Check if the edge worker configuration has changed if !reflect.DeepEqual(oldService.Spec.EdgeWorker, newService.Spec.EdgeWorker) { // Update edge worker and log any errors if err := c.updateEdgeWorker(newService, bigModelHost); err != nil { klog.Errorf("Failed to update edge worker for service %s/%s: %v", newService.Namespace, newService.Name, err) } } return nil }

tangming1996 · 2024-10-29T07:58:12Z

pkg/globalmanager/controllers/jointinference/jointinferenceservice_test.go

+	sednav1 "github.com/kubeedge/sedna/pkg/apis/sedna/v1alpha1"
+	fakeseednaclientset "github.com/kubeedge/sedna/pkg/client/clientset/versioned/fake"
+	"github.com/kubeedge/sedna/pkg/globalmanager/config"
+	"github.com/kubeedge/sedna/pkg/globalmanager/runtime"
+	appsv1 "k8s.io/api/apps/v1"
+	v1 "k8s.io/api/core/v1"
+	k8serrors "k8s.io/apimachinery/pkg/api/errors"
+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+	"k8s.io/apimachinery/pkg/labels"
+	"k8s.io/apimachinery/pkg/watch"
+	kubernetesfake "k8s.io/client-go/kubernetes/fake"
+	"k8s.io/client-go/kubernetes/scheme"
+	v1core "k8s.io/client-go/kubernetes/typed/core/v1"
+	corelisters "k8s.io/client-go/listers/apps/v1"
+	corelistersv1 "k8s.io/client-go/listers/core/v1"
+	"k8s.io/client-go/tools/record"
+	"k8s.io/client-go/util/workqueue"


Grouping of imports.

tangming1996 · 2024-10-29T09:51:27Z

pkg/globalmanager/controllers/jointinference/jointinferenceservice.go

+			"BIG_MODEL_BIND_PORT": strconv.Itoa(int(bigModelPort)),
+		}
+		workerParam.WorkerType = workerType
+		workerParam.HostNetwork = false // The cloud does not need HostNetwork.


Leave the configuration of hostNetwork to the user. There is no need for mandatory configuration here.

tangming1996 · 2024-10-29T09:51:56Z

pkg/globalmanager/controllers/jointinference/jointinferenceservice.go

+			"LC_SERVER": c.cfg.LC.Server,
+		}
+		workerParam.WorkerType = workerType
+		workerParam.HostNetwork = true // Edge nodes need HostNetwork.


After these pull requests are merged, the current joint-inference examples need to add the hostNetwork configuration in the YAML, similar to the following:

CLOUD_NODE="cloud-node-name" EDGE_NODE="edge-node-name" kubectl create -f - <<EOF apiVersion: sedna.io/v1alpha1 kind: JointInferenceService metadata: name: helmet-detection-inference-example namespace: default spec: edgeWorker: model: name: "helmet-detection-inference-little-model" hardExampleMining: name: "IBT" parameters: - key: "threshold_img" value: "0.9" - key: "threshold_box" value: "0.9" template: spec: nodeName: $EDGE_NODE dnsPolicy: ClusterFirstWithHostNet hostNetwork: true containers: - image: kubeedge/sedna-example-joint-inference-helmet-detection-little:v0.3.0 imagePullPolicy: IfNotPresent name: little-model env: # user defined environments - name: input_shape value: "416,736" - name: "video_url" value: "rtsp://localhost/video" - name: "all_examples_inference_output" value: "/data/output" - name: "hard_example_cloud_inference_output" value: "/data/hard_example_cloud_inference_output" - name: "hard_example_edge_inference_output" value: "/data/hard_example_edge_inference_output" resources: # user defined resources requests: memory: 64M cpu: 100m limits: memory: 2Gi volumeMounts: - name: outputdir mountPath: /data/ volumes: # user defined volumes - name: outputdir hostPath: # user must create the directory in host path: /joint_inference/output type: Directory cloudWorker: model: name: "helmet-detection-inference-big-model" template: spec: nodeName: $CLOUD_NODE dnsPolicy: ClusterFirstWithHostNet containers: - image: kubeedge/sedna-example-joint-inference-helmet-detection-big:v0.3.0 name: big-model imagePullPolicy: IfNotPresent env: # user defined environments - name: "input_shape" value: "544,544" resources: # user defined resources requests: memory: 2Gi EOF

MooreZheng · 2024-10-29T16:17:45Z

/lgtm

…e a Deployment resource instead of a Pod resource, and let the Deployment controller manage the number of Pods.automatically update deployment when modifying CRD of joint inference.add a test file to ensure the correctness of joint inference controller. Signed-off-by: SherlockShemol <[email protected]>

tangming1996 · 2024-10-30T02:01:19Z

/lgtm

jaypume · 2024-10-30T12:45:36Z

/lgtm

jaypume · 2024-10-30T12:45:48Z

/approve

kubeedge-bot · 2024-10-30T12:45:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jaypume

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jaypume]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kubeedge-bot requested review from jaypume and JimmyYang20 September 29, 2024 04:46

kubeedge-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 29, 2024

MooreZheng suggested changes Oct 14, 2024

View reviewed changes

kubeedge-bot assigned MooreZheng Oct 14, 2024

SherlockShemol closed this Oct 24, 2024

SherlockShemol force-pushed the ji-controller-enhancement branch from 318f6fc to 3342955 Compare October 24, 2024 14:41

kubeedge-bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 24, 2024

SherlockShemol reopened this Oct 24, 2024

kubeedge-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 24, 2024

kubeedge-bot assigned tangming1996 Oct 28, 2024

kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 28, 2024

tangming1996 reviewed Oct 29, 2024

View reviewed changes

SherlockShemol force-pushed the ji-controller-enhancement branch from 26befb9 to 55b7d88 Compare October 29, 2024 07:51

kubeedge-bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 29, 2024

tangming1996 reviewed Oct 29, 2024

View reviewed changes

SherlockShemol force-pushed the ji-controller-enhancement branch from 55b7d88 to e02a997 Compare October 29, 2024 08:12

kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 29, 2024

tangming1996 reviewed Oct 29, 2024

View reviewed changes

SherlockShemol force-pushed the ji-controller-enhancement branch from e02a997 to bfa3a65 Compare October 30, 2024 01:42

kubeedge-bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 30, 2024

kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 30, 2024

kubeedge-bot assigned jaypume Oct 30, 2024

kubeedge-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2024

kubeedge-bot merged commit cabab5f into kubeedge:main Oct 30, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JointInferenceService controller enhancement #445

JointInferenceService controller enhancement #445

SherlockShemol commented Sep 29, 2024 •

edited

Loading

MooreZheng left a comment

tangming1996 commented Oct 28, 2024

tangming1996 Oct 29, 2024

SherlockShemol Oct 29, 2024

tangming1996 Oct 29, 2024

SherlockShemol Oct 29, 2024

tangming1996 Oct 29, 2024

tangming1996 Oct 29, 2024

tangming1996 Oct 29, 2024

MooreZheng commented Oct 29, 2024

tangming1996 commented Oct 30, 2024

jaypume commented Oct 30, 2024

jaypume commented Oct 30, 2024

kubeedge-bot commented Oct 30, 2024

JointInferenceService controller enhancement #445

JointInferenceService controller enhancement #445

Conversation

SherlockShemol commented Sep 29, 2024 • edited Loading

MooreZheng left a comment

Choose a reason for hiding this comment

tangming1996 commented Oct 28, 2024

tangming1996 Oct 29, 2024

Choose a reason for hiding this comment

SherlockShemol Oct 29, 2024

Choose a reason for hiding this comment

tangming1996 Oct 29, 2024

Choose a reason for hiding this comment

SherlockShemol Oct 29, 2024

Choose a reason for hiding this comment

tangming1996 Oct 29, 2024

Choose a reason for hiding this comment

tangming1996 Oct 29, 2024

Choose a reason for hiding this comment

tangming1996 Oct 29, 2024

Choose a reason for hiding this comment

MooreZheng commented Oct 29, 2024

tangming1996 commented Oct 30, 2024

jaypume commented Oct 30, 2024

jaypume commented Oct 30, 2024

kubeedge-bot commented Oct 30, 2024

SherlockShemol commented Sep 29, 2024 •

edited

Loading