-
Notifications
You must be signed in to change notification settings - Fork 682
[E2E] [RayCronJob] add e2e test for suspend behavior #4349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[E2E] [RayCronJob] add e2e test for suspend behavior #4349
Conversation
…guration Signed-off-by: AndySung320 <[email protected]>
Signed-off-by: AndySung320 <[email protected]>
seanlaii
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
| //+kubebuilder:printcolumn:name="age",type="date",JSONPath=".metadata.creationTimestamp",priority=0 | ||
| //+kubebuilder:printcolumn:name="suspend",type=boolean,JSONPath=".spec.suspend",priority=0 | ||
|
|
||
| // +genclient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Maybe we can also add the following annotations as well:
// +kubebuilder:storageversion
// +kubebuilder:resource:categories=all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- // +kubebuilder:storageversion: marker to indicate the GVK that should be used to store data by the API server.
- // +kubebuilder:resource:categories=all: Puts the CRD into the all category, so it shows up in
kubectl get all.
Ref:
https://book.kubebuilder.io/reference/markers/crd
https://book.kubebuilder.io/reference/generating-crd#multiple-versions
Cool!
| g.Expect(err).NotTo(HaveOccurred()) | ||
| ownerUID := rcj.UID | ||
|
|
||
| // No RayJob should be created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| // No RayJob should be created | |
| // No RayJob should be created | |
| LogWithTimestamp(test.T(), "Waiting to ensure no RayJobs are created while suspended") |
| return n | ||
| }, 130*time.Second, 5*time.Second).Should(Equal(0)) | ||
|
|
||
| // Resume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| // Resume | |
| LogWithTimestamp(test.T(), "Resuming RayCronJob %s/%s", rayCronJob.Namespace, rayCronJob.Name) |
| if err != nil { | ||
| return -1 | ||
| } | ||
| return n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To preserve error:
| if err != nil { | |
| return -1 | |
| } | |
| return n | |
| g.Expect(err).NotTo(HaveOccurred()) | |
| return n |
|
|
||
| // Spec.suspend should be false | ||
| g.Eventually(func() bool { | ||
| rcj, err := test.Client().Ray().RayV1().RayCronJobs(namespace.Name).Get(test.Ctx(), rayCronJob.Name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can replace this with a util function similar to GetRayJob.
|
cursor review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Bugbot reviewed your changes and found no bugs!
Future-Outlier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @machichima @CheyuWu @justinyeh1995 @fscnick to take a look, thank you!
| func rayCronJobACTemplate(name, namespace, schedule string) *rayv1ac.RayCronJobApplyConfiguration { | ||
| return rayv1ac.RayCronJob(name, namespace). | ||
| WithSpec( | ||
| rayv1ac.RayCronJobSpec(). | ||
| WithSchedule(schedule). | ||
| WithJobTemplate( | ||
| rayv1ac.RayJobSpec(). | ||
| WithEntrypoint("sleep 1"). | ||
| WithRayClusterSpec( | ||
| rayv1ac.RayClusterSpec(). | ||
| WithHeadGroupSpec( | ||
| rayv1ac.HeadGroupSpec(). | ||
| WithTemplate( | ||
| corev1ac.PodTemplateSpec(). | ||
| WithSpec( | ||
| corev1ac.PodSpec(). | ||
| WithContainers( | ||
| corev1ac.Container(). | ||
| WithName("ray-head"). | ||
| WithImage(GetRayImage()). | ||
| WithResources( | ||
| corev1ac.ResourceRequirements(). | ||
| WithRequests(corev1.ResourceList{ | ||
| corev1.ResourceCPU: resource.MustParse("500m"), | ||
| corev1.ResourceMemory: resource.MustParse("500Mi"), | ||
| }), | ||
| ), | ||
| ), | ||
| ), | ||
| ), | ||
| ), | ||
| ), | ||
| ), | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for readability, would it make sense to construct it bottom-up using intermediate variables?
For example
func rayCronJobACTemplate(name, namespace, schedule string) *rayv1ac.RayCronJobApplyConfiguration {
headContainer := corev1ac.Container().
WithName("ray-head").
WithImage(GetRayImage()).
WithResources(
corev1ac.ResourceRequirements().
WithRequests(corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse("500m"),
corev1.ResourceMemory: resource.MustParse("500Mi"),
}),
)
podTmpl := corev1ac.PodTemplateSpec().
WithSpec(corev1ac.PodSpec().WithContainers(headContainer))
cluster := ...
job := ...
spec := ...
return rayv1ac.RayCronJob(name, namespace).WithSpec(spec)
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simply use:
func rayCronJobACTemplate(name, namespace, schedule string) *rayv1ac.RayCronJobApplyConfiguration {
return rayv1ac.RayCronJob(name, namespace).
WithSpec(
rayv1ac.RayCronJobSpec().
WithSchedule(schedule).
WithJobTemplate(
rayv1ac.RayJobSpec().
WithEntrypoint("sleep 1").
WithRayClusterSpec(NewRayClusterSpec()),
),
)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great suggestion, thanks!
| return -1 | ||
| } | ||
| return n | ||
| }, 130*time.Second, 5*time.Second).Should(Equal(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| }, 130*time.Second, 5*time.Second).Should(Equal(0)) | |
| }, 130*time.Second, 5*time.Second).Should(BeZero()) |
| err = test.Client().Ray().RayV1().RayCronJobs(namespace.Name).Delete(test.Ctx(), rayCronJob.Name, metav1.DeleteOptions{}) | ||
| g.Expect(err).NotTo(HaveOccurred()) | ||
| LogWithTimestamp(test.T(), "Deleted RayCronJob %s/%s successfully", rayCronJob.Namespace, rayCronJob.Name) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we assert that the related resource has been deleted correctly?
| // Spec.suspend should be true | ||
| g.Eventually(func() bool { | ||
| job, err := test.Client(). | ||
| Ray(). | ||
| RayV1(). | ||
| RayCronJobs(namespace.Name). | ||
| Get(test.Ctx(), rayCronJob.Name, metav1.GetOptions{}) | ||
| if err != nil { | ||
| return false | ||
| } | ||
| return job.Spec.Suspend | ||
| }, TestTimeoutShort).Should(BeTrue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Spec.suspend should be true | |
| g.Eventually(func() bool { | |
| job, err := test.Client(). | |
| Ray(). | |
| RayV1(). | |
| RayCronJobs(namespace.Name). | |
| Get(test.Ctx(), rayCronJob.Name, metav1.GetOptions{}) | |
| if err != nil { | |
| return false | |
| } | |
| return job.Spec.Suspend | |
| }, TestTimeoutShort).Should(BeTrue()) |
I think this check is redundant. Since the Server-Side Apply call already succeeded, the field is guaranteed to be set in the API server. The subsequent Consistently block provides sufficient validation by ensuring no jobs are actually created.
| return job.Spec.Suspend | ||
| }, TestTimeoutShort).Should(BeTrue()) | ||
|
|
||
| rcj, err := test.Client().Ray().RayV1().RayCronJobs(namespace.Name).Get(test.Ctx(), rayCronJob.Name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rename rcj to rayCronJob for clarity?
Why are these changes needed?
This PR adds an e2e test covering RayCronJob suspend/resume behavior:
spec.suspend=true, the RayCronJob controller should not create any new RayJobs even after the scheduled time passes.spec.suspend=false, the controller should resume scheduling and start creating RayJobs again.To run this e2e test in CI, we also enable the RayCronJob feature gate in the test/CI-only operator overrides (Helm values override and kustomize test override).
Note: RayCronJob previously did not have
// +genclient, so the typed client/applyconfiguration for RayCronJob was not generated. The PR adds+genclientand commits the resulting regenerated clientset/applyconfiguration/informer/lister code so the test can compile and run consistently.Related issue number
Closes #4323
Checks