Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JointInferenceService controller enhancement #445

Merged

Conversation

SherlockShemol
Copy link
Contributor

@SherlockShemol SherlockShemol commented Sep 29, 2024

What type of PR is this?

What this PR does / why we need it:

  • create deployment resources to enable pod to rebuild after manual deletion.
  • enable JointInferenceService Controller to update pod when modifying CRD.
  • add test file to test files to ensure functionality stability and correctness.
    Which issue(s) this PR fixes:

Fixes #

@kubeedge-bot kubeedge-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 29, 2024
Copy link
Contributor

@MooreZheng MooreZheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are CI issues that remain to be resolved, see https://github.com/kubeedge/sedna/actions/runs/11249926500/job/31277758880?pr=445

@SherlockShemol SherlockShemol force-pushed the ji-controller-enhancement branch from 318f6fc to 3342955 Compare October 24, 2024 14:41
@kubeedge-bot kubeedge-bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 24, 2024
@kubeedge-bot kubeedge-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 24, 2024
@tangming1996
Copy link
Contributor

/lgtm

@kubeedge-bot kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 28, 2024
// Check the changes in specific fields and perform corresponding operations.
if !reflect.DeepEqual(oldService.Spec.CloudWorker, newService.Spec.CloudWorker) {
// If the cloud inference service changes, perform the corresponding update operation.
return c.updateCloudWorker(newService)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't return directly; otherwise, the subsequent code won't be executed directly.

Modify the updateInferenceServices function as follows.

func (c *Controller) updateInferenceServices(old, cur interface{}) error {
	oldService := old.(*sednav1.JointInferenceService)
	newService := cur.(*sednav1.JointInferenceService)

	// Check if the cloud worker configuration has changed
	if !reflect.DeepEqual(oldService.Spec.CloudWorker, newService.Spec.CloudWorker) {
		// Update cloud worker and log any errors
		if err := c.updateCloudWorker(newService); err != nil {
			klog.Errorf("Failed to update cloud worker for service %s/%s: %v", newService.Namespace, newService.Name, err)
		}
	}

	// Retrieve the address of the cloud inference service
	var bigModelHost string
	svc, err := c.kubeClient.CoreV1().Services(oldService.Namespace).Get(context.Background(),
		strings.ToLower(oldService.Name+"-"+jointInferenceForCloud), metav1.GetOptions{})
	if err != nil {
		if errors.IsNotFound(err) {
			// If the service does not exist, create a new one and retrieve its address
			klog.Info("Cloud service not found, creating new service...")
			bigModelHost, err = runtime.CreateEdgeMeshService(c.kubeClient, oldService, jointInferenceForCloud, BigModelPort)
			if err != nil {
				klog.Errorf("Failed to create EdgeMesh service for service %s/%s: %v", oldService.Namespace, oldService.Name, err)
			}
		} else {
			klog.Errorf("Failed to get cloud service %s/%s: %v", oldService.Namespace, oldService.Name, err)
		}
	} else {
		bigModelHost = fmt.Sprintf("%s.%s", svc.Name, svc.Namespace)
	}

	// Check if the edge worker configuration has changed
	if !reflect.DeepEqual(oldService.Spec.EdgeWorker, newService.Spec.EdgeWorker) {
		// Update edge worker and log any errors
		if err := c.updateEdgeWorker(newService, bigModelHost); err != nil {
			klog.Errorf("Failed to update edge worker for service %s/%s: %v", newService.Namespace, newService.Name, err)
		}
	}

	return nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified.

@SherlockShemol SherlockShemol force-pushed the ji-controller-enhancement branch from 26befb9 to 55b7d88 Compare October 29, 2024 07:51
@kubeedge-bot kubeedge-bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 29, 2024
Comment on lines 7 to 19
sednav1 "github.com/kubeedge/sedna/pkg/apis/sedna/v1alpha1"
fakeseednaclientset "github.com/kubeedge/sedna/pkg/client/clientset/versioned/fake"
"github.com/kubeedge/sedna/pkg/globalmanager/config"
"github.com/kubeedge/sedna/pkg/globalmanager/runtime"
appsv1 "k8s.io/api/apps/v1"
v1 "k8s.io/api/core/v1"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/watch"
kubernetesfake "k8s.io/client-go/kubernetes/fake"
"k8s.io/client-go/kubernetes/scheme"
v1core "k8s.io/client-go/kubernetes/typed/core/v1"
corelisters "k8s.io/client-go/listers/apps/v1"
corelistersv1 "k8s.io/client-go/listers/core/v1"
"k8s.io/client-go/tools/record"
"k8s.io/client-go/util/workqueue"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouping of imports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified.

@SherlockShemol SherlockShemol force-pushed the ji-controller-enhancement branch from 55b7d88 to e02a997 Compare October 29, 2024 08:12
@kubeedge-bot kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 29, 2024
"BIG_MODEL_BIND_PORT": strconv.Itoa(int(bigModelPort)),
}
workerParam.WorkerType = workerType
workerParam.HostNetwork = false // The cloud does not need HostNetwork.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave the configuration of hostNetwork to the user. There is no need for mandatory configuration here.

"LC_SERVER": c.cfg.LC.Server,
}
workerParam.WorkerType = workerType
workerParam.HostNetwork = true // Edge nodes need HostNetwork.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After these pull requests are merged, the current joint-inference examples need to add the hostNetwork configuration in the YAML, similar to the following:

CLOUD_NODE="cloud-node-name"
EDGE_NODE="edge-node-name"


kubectl create -f - <<EOF
apiVersion: sedna.io/v1alpha1
kind: JointInferenceService
metadata:
  name: helmet-detection-inference-example
  namespace: default
spec:
  edgeWorker:
    model:
      name: "helmet-detection-inference-little-model"
    hardExampleMining:
      name: "IBT"
      parameters:
        - key: "threshold_img"
          value: "0.9"
        - key: "threshold_box"
          value: "0.9"
    template:
      spec:
        nodeName: $EDGE_NODE
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true
        containers:
        - image: kubeedge/sedna-example-joint-inference-helmet-detection-little:v0.3.0
          imagePullPolicy: IfNotPresent
          name:  little-model
          env:  # user defined environments
          - name: input_shape
            value: "416,736"
          - name: "video_url"
            value: "rtsp://localhost/video"
          - name: "all_examples_inference_output"
            value: "/data/output"
          - name: "hard_example_cloud_inference_output"
            value: "/data/hard_example_cloud_inference_output"
          - name: "hard_example_edge_inference_output"
            value: "/data/hard_example_edge_inference_output"
          resources:  # user defined resources
            requests:
              memory: 64M
              cpu: 100m
            limits:
              memory: 2Gi
          volumeMounts:
            - name: outputdir
              mountPath: /data/
        volumes:   # user defined volumes
          - name: outputdir
            hostPath:
              # user must create the directory in host
              path: /joint_inference/output
              type: Directory

  cloudWorker:
    model:
      name: "helmet-detection-inference-big-model"
    template:
      spec:
        nodeName: $CLOUD_NODE
        dnsPolicy: ClusterFirstWithHostNet
        containers:
          - image: kubeedge/sedna-example-joint-inference-helmet-detection-big:v0.3.0
            name:  big-model
            imagePullPolicy: IfNotPresent
            env:  # user defined environments
              - name: "input_shape"
                value: "544,544"
            resources:  # user defined resources
              requests:
                memory: 2Gi
EOF

@MooreZheng
Copy link
Contributor

/lgtm

…e a Deployment resource instead of a Pod resource, and let the Deployment controller manage the number of Pods.automatically update deployment when modifying CRD of joint inference.add a test file to ensure the correctness of joint inference controller.

Signed-off-by: SherlockShemol <[email protected]>
@SherlockShemol SherlockShemol force-pushed the ji-controller-enhancement branch from e02a997 to bfa3a65 Compare October 30, 2024 01:42
@kubeedge-bot kubeedge-bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 30, 2024
@tangming1996
Copy link
Contributor

/lgtm

@kubeedge-bot kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 30, 2024
@jaypume
Copy link
Member

jaypume commented Oct 30, 2024

/lgtm

@jaypume
Copy link
Member

jaypume commented Oct 30, 2024

/approve

@kubeedge-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jaypume

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2024
@kubeedge-bot kubeedge-bot merged commit cabab5f into kubeedge:main Oct 30, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants