fix wait for webhook to be ready before running webhook tests#816
fix wait for webhook to be ready before running webhook tests#816volcano-sh-bot merged 4 commits intovolcano-sh:mainfrom
Conversation
Signed-off-by: katara-Jayprakash <katarajayprakash@icloud.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a potential race condition in the E2E test suite by ensuring that the webhook server is fully initialized and ready to receive requests before any webhook-dependent tests are executed. This enhancement improves the reliability and stability of the E2E tests, reducing intermittent failures caused by timing issues. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a waiting mechanism for the webhook to become ready before executing e2e tests, which is a great way to improve test stability. I've provided a couple of suggestions to enhance the implementation of the new wait function for better clarity and maintainability.
| err := wait.PollUntilContextTimeout(ctx, 2*time.Second, 1*time.Minute, true, func(ctx context.Context) (bool, error) { | ||
| ep, err := kubeClient.CoreV1().Endpoints(kthenaNamespace).Get(ctx, "kthena-controller-manager-webhook", metav1.GetOptions{}) | ||
|
|
||
| // try again | ||
| if err != nil { | ||
| t.Logf("webhook endpoint not ready yet: %v", err) | ||
| return false, nil | ||
| } | ||
| for _, subset := range ep.Subsets { | ||
| if len(subset.Addresses) > 0 { | ||
| return true, nil // true = we are done! webhook is ready | ||
| } | ||
| } | ||
| t.Log("webhook has no ready addresses yet") | ||
| return false, nil | ||
| }) |
There was a problem hiding this comment.
Since the context ctx already has a timeout, you can use wait.PollUntilContextCancel to avoid specifying a redundant timeout. This makes the intent clearer and is consistent with other parts of this file (e.g., line 96).
| err := wait.PollUntilContextTimeout(ctx, 2*time.Second, 1*time.Minute, true, func(ctx context.Context) (bool, error) { | |
| ep, err := kubeClient.CoreV1().Endpoints(kthenaNamespace).Get(ctx, "kthena-controller-manager-webhook", metav1.GetOptions{}) | |
| // try again | |
| if err != nil { | |
| t.Logf("webhook endpoint not ready yet: %v", err) | |
| return false, nil | |
| } | |
| for _, subset := range ep.Subsets { | |
| if len(subset.Addresses) > 0 { | |
| return true, nil // true = we are done! webhook is ready | |
| } | |
| } | |
| t.Log("webhook has no ready addresses yet") | |
| return false, nil | |
| }) | |
| err := wait.PollUntilContextCancel(ctx, 2*time.Second, true, func(ctx context.Context) (bool, error) { | |
| ep, err := kubeClient.CoreV1().Endpoints(kthenaNamespace).Get(ctx, "kthena-controller-manager-webhook", metav1.GetOptions{}) | |
| // try again | |
| if err != nil { | |
| t.Logf("webhook endpoint not ready yet: %v", err) | |
| return false, nil | |
| } | |
| for _, subset := range ep.Subsets { | |
| if len(subset.Addresses) > 0 { | |
| return true, nil // true = we are done! webhook is ready | |
| } | |
| } | |
| t.Log("webhook has no ready addresses yet") | |
| return false, nil | |
| }) |
| // you are missing this line! | ||
| require.NoError(t, err, "webhook did not become ready in time increase the waiting period") |
There was a problem hiding this comment.
This comment appears to be a leftover from development and can be removed. Additionally, the error message can be made more direct to clearly state the reason for the test failure.
| // you are missing this line! | |
| require.NoError(t, err, "webhook did not become ready in time increase the waiting period") | |
| require.NoError(t, err, "timed out waiting for webhook to become ready") |
There was a problem hiding this comment.
Pull request overview
This PR improves the reliability of the controller-manager webhook e2e tests by waiting for the webhook endpoint to become ready before executing webhook validation/mutation test cases.
Changes:
- Update
TestWebhookto wait for the webhook to be ready before running test cases. - Add a
WaitForWebhookReadyhelper that polls the webhook Endpoints object until it has ready addresses.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
test/e2e/controller-manager/webhook_test.go |
Calls the new readiness wait helper before running webhook test cases. |
test/e2e/controller-manager/test_suite_test.go |
Adds WaitForWebhookReady helper that polls the webhook endpoint for readiness. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute) | ||
| defer cancel() |
Signed-off-by: katara-Jayprakash <katarajayprakash@icloud.com>
|
/cc @YaoZengzeng @LiZhenCheng9527 @hzxuzhonghu I would appreciate your review sir, when you have a moment. |
hzxuzhonghu
left a comment
There was a problem hiding this comment.
The current readiness check does not actually validate the condition that caused the flake.
waitForWebhookReady polls the Endpoints object for ready addresses, but that is effectively the same readiness signal that already let the test continue before. The issue description says the controller-manager pod can be Ready while the webhook server inside it is still not accepting connections, so this new check can still pass before the webhook listener is reachable.
Please switch the probe to an actual webhook request path, for example a dry-run create against one of the webhook-protected resources, and retry only on connect: connection refused until it succeeds.
|
FYI: #806 (comment) |
thanx for you response, i will going to update this pr till tonight |
Signed-off-by: katara-Jayprakash <katarajayprakash@icloud.com>
There was a problem hiding this comment.
Pull request overview
Stabilizes controller-manager e2e webhook tests by adding a readiness wait before executing webhook assertions, addressing intermittent “connection refused” failures (Fixes #815).
Changes:
- Add a pre-test webhook readiness probe to reduce flakiness in
webhook_test.go. - Implement
waitForWebhookReadyhelper that polls the API server until webhook requests succeed. - Add
stringsimport to support connection error inspection.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| test/e2e/controller-manager/webhook_test.go | Adds a readiness wait before running webhook test cases. |
| test/e2e/controller-manager/test_suite_test.go | Implements polling helper to detect when the webhook server is accepting requests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func waitForWebhookReady(t *testing.T, kthenaClient *clientset.Clientset, namespace string) { | ||
| t.Helper() | ||
| t.Log("Waiting for webhook server to accept requests") | ||
|
|
||
| ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute) | ||
| defer cancel() |
| _, err := kthenaClient.WorkloadV1alpha1().ModelBoosters(namespace).Create(ctx, probe, metav1.CreateOptions{DryRun: []string{"All"}}) | ||
| if err != nil { | ||
| errStr := err.Error() | ||
| if strings.Contains(errStr, "connect: connection refused") { | ||
| t.Logf("Webhook not ready yet (connection refused), retrying: %v", err) | ||
| return false, nil | ||
| } | ||
| return false, err | ||
| } |
| probe.Namespace = namespace | ||
| probe.Name = "webhook-ready-probe-" + utils.RandomString(5) | ||
|
|
||
| _, err := kthenaClient.WorkloadV1alpha1().ModelBoosters(namespace).Create(ctx, probe, metav1.CreateOptions{DryRun: []string{"All"}}) |
Signed-off-by: katara-Jayprakash <katarajayprakash@icloud.com>
|
/cc @hzxuzhonghu ptal sir, i think its ready now! |
hzxuzhonghu
left a comment
There was a problem hiding this comment.
/lgtm
Overall, think this can fix the flake. I am thinking about a more grace way, check the kthena ready before running any test, maybe can do that in func TestMain(m *testing.M) { after insatlled kthena.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hzxuzhonghu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
This PR fixes webhook tests that were failing with "connection refused" error in webhook_test.go e2e tests.

What this PR does / why we need it:
we need this pr to make our e2e test more stable
Which issue(s) this PR fixes:
Fixes #815
Special notes for your reviewer:
Does this PR introduce a user-facing change?: