fix wait for webhook to be ready before running webhook tests by katara-Jayprakash · Pull Request #816 · volcano-sh/kthena

katara-Jayprakash · 2026-03-15T19:59:59Z

What type of PR is this?

This PR fixes webhook tests that were failing with "connection refused" error in webhook_test.go e2e tests.

What this PR does / why we need it:

we need this pr to make our e2e test more stable

Which issue(s) this PR fixes:
Fixes #815

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

 NONE

Signed-off-by: katara-Jayprakash <katarajayprakash@icloud.com>

gemini-code-assist · 2026-03-15T20:00:12Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a potential race condition in the E2E test suite by ensuring that the webhook server is fully initialized and ready to receive requests before any webhook-dependent tests are executed. This enhancement improves the reliability and stability of the E2E tests, reducing intermittent failures caused by timing issues.

Highlights

Webhook Test Reliability: Introduced a new utility function, WaitForWebhookReady, to ensure that the webhook server is fully operational before executing related E2E tests.
E2E Test Setup Enhancement: Integrated the WaitForWebhookReady function into the TestWebhook E2E test suite, preventing test failures due to race conditions where tests might run before the webhook is ready.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

test/e2e/controller-manager/test_suite_test.go
- Added a new function WaitForWebhookReady that polls the Kubernetes API for the webhook endpoint's readiness, specifically checking for available addresses.
test/e2e/controller-manager/webhook_test.go
- Modified the TestWebhook function to call WaitForWebhookReady after setting up the E2E test environment, ensuring the webhook is ready before proceeding with test cases.

Activity

No activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a waiting mechanism for the webhook to become ready before executing e2e tests, which is a great way to improve test stability. I've provided a couple of suggestions to enhance the implementation of the new wait function for better clarity and maintainability.

gemini-code-assist · 2026-03-15T20:01:30Z

test/e2e/controller-manager/test_suite_test.go

+	err := wait.PollUntilContextTimeout(ctx, 2*time.Second, 1*time.Minute, true, func(ctx context.Context) (bool, error) {
+		ep, err := kubeClient.CoreV1().Endpoints(kthenaNamespace).Get(ctx, "kthena-controller-manager-webhook", metav1.GetOptions{})
+
+		// try again
+		if err != nil {
+			t.Logf("webhook endpoint not ready yet: %v", err)
+			return false, nil
+		}
+		for _, subset := range ep.Subsets {
+			if len(subset.Addresses) > 0 {
+				return true, nil // true = we are done! webhook is ready
+			}
+		}
+		t.Log("webhook has no ready addresses yet")
+		return false, nil
+	})


Since the context ctx already has a timeout, you can use wait.PollUntilContextCancel to avoid specifying a redundant timeout. This makes the intent clearer and is consistent with other parts of this file (e.g., line 96).

Suggested change

err := wait.PollUntilContextTimeout(ctx, 2*time.Second, 1*time.Minute, true, func(ctx context.Context) (bool, error) {

ep, err := kubeClient.CoreV1().Endpoints(kthenaNamespace).Get(ctx, "kthena-controller-manager-webhook", metav1.GetOptions{})

// try again

if err != nil {

t.Logf("webhook endpoint not ready yet: %v", err)

return false, nil

}

for _, subset := range ep.Subsets {

if len(subset.Addresses) > 0 {

return true, nil // true = we are done! webhook is ready

}

}

t.Log("webhook has no ready addresses yet")

return false, nil

})

err := wait.PollUntilContextCancel(ctx, 2*time.Second, true, func(ctx context.Context) (bool, error) {

ep, err := kubeClient.CoreV1().Endpoints(kthenaNamespace).Get(ctx, "kthena-controller-manager-webhook", metav1.GetOptions{})

// try again

if err != nil {

t.Logf("webhook endpoint not ready yet: %v", err)

return false, nil

}

for _, subset := range ep.Subsets {

if len(subset.Addresses) > 0 {

return true, nil // true = we are done! webhook is ready

}

}

t.Log("webhook has no ready addresses yet")

return false, nil

})

gemini-code-assist · 2026-03-15T20:01:30Z

test/e2e/controller-manager/test_suite_test.go

+	// you are missing this line!
+	require.NoError(t, err, "webhook did not become ready in time increase the waiting period")


This comment appears to be a leftover from development and can be removed. Additionally, the error message can be made more direct to clearly state the reason for the test failure.

Suggested change

// you are missing this line!

require.NoError(t, err, "webhook did not become ready in time increase the waiting period")

require.NoError(t, err, "timed out waiting for webhook to become ready")

Copilot

Pull request overview

This PR improves the reliability of the controller-manager webhook e2e tests by waiting for the webhook endpoint to become ready before executing webhook validation/mutation test cases.

Changes:

Update TestWebhook to wait for the webhook to be ready before running test cases.
Add a WaitForWebhookReady helper that polls the webhook Endpoints object until it has ready addresses.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`test/e2e/controller-manager/webhook_test.go`	Calls the new readiness wait helper before running webhook test cases.
`test/e2e/controller-manager/test_suite_test.go`	Adds `WaitForWebhookReady` helper that polls the webhook endpoint for readiness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/e2e/controller-manager/test_suite_test.go

+	ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute)
+	defer cancel()


Signed-off-by: katara-Jayprakash <katarajayprakash@icloud.com>

katara-Jayprakash · 2026-03-15T20:29:29Z

/cc @YaoZengzeng @LiZhenCheng9527 @hzxuzhonghu I would appreciate your review sir, when you have a moment.

LiZhenCheng9527

/lgtm

hzxuzhonghu

The current readiness check does not actually validate the condition that caused the flake.

waitForWebhookReady polls the Endpoints object for ready addresses, but that is effectively the same readiness signal that already let the test continue before. The issue description says the controller-manager pod can be Ready while the webhook server inside it is still not accepting connections, so this new check can still pass before the webhook listener is reachable.

Please switch the probe to an actual webhook request path, for example a dry-run create against one of the webhook-protected resources, and retry only on connect: connection refused until it succeeds.

FAUST-BENCHOU · 2026-03-16T13:45:45Z

FYI: #806 (comment)
seems same problem

katara-Jayprakash · 2026-03-16T14:04:39Z

The current readiness check does not actually validate the condition that caused the flake.

waitForWebhookReady polls the Endpoints object for ready addresses, but that is effectively the same readiness signal that already let the test continue before. The issue description says the controller-manager pod can be Ready while the webhook server inside it is still not accepting connections, so this new check can still pass before the webhook listener is reachable.

Please switch the probe to an actual webhook request path, for example a dry-run create against one of the webhook-protected resources, and retry only on connect: connection refused until it succeeds.

thanx for you response, i will going to update this pr till tonight

Signed-off-by: katara-Jayprakash <katarajayprakash@icloud.com>

Copilot

Pull request overview

Stabilizes controller-manager e2e webhook tests by adding a readiness wait before executing webhook assertions, addressing intermittent “connection refused” failures (Fixes #815).

Changes:

Add a pre-test webhook readiness probe to reduce flakiness in webhook_test.go.
Implement waitForWebhookReady helper that polls the API server until webhook requests succeed.
Add strings import to support connection error inspection.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
test/e2e/controller-manager/webhook_test.go	Adds a readiness wait before running webhook test cases.
test/e2e/controller-manager/test_suite_test.go	Implements polling helper to detect when the webhook server is accepting requests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/e2e/controller-manager/test_suite_test.go

+func waitForWebhookReady(t *testing.T, kthenaClient *clientset.Clientset, namespace string) {
+	t.Helper()
+	t.Log("Waiting for webhook server to accept requests")
+
+	ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute)
+	defer cancel()


test/e2e/controller-manager/test_suite_test.go

+		_, err := kthenaClient.WorkloadV1alpha1().ModelBoosters(namespace).Create(ctx, probe, metav1.CreateOptions{DryRun: []string{"All"}})
+		if err != nil {
+			errStr := err.Error()
+			if strings.Contains(errStr, "connect: connection refused") {
+				t.Logf("Webhook not ready yet (connection refused), retrying: %v", err)
+				return false, nil
+			}
+			return false, err
+		}


test/e2e/controller-manager/test_suite_test.go

+		probe.Namespace = namespace
+		probe.Name = "webhook-ready-probe-" + utils.RandomString(5)
+
+		_, err := kthenaClient.WorkloadV1alpha1().ModelBoosters(namespace).Create(ctx, probe, metav1.CreateOptions{DryRun: []string{"All"}})


Signed-off-by: katara-Jayprakash <katarajayprakash@icloud.com>

katara-Jayprakash · 2026-03-16T15:33:46Z

/cc @hzxuzhonghu ptal sir, i think its ready now!

hzxuzhonghu

/lgtm

Overall, think this can fix the flake. I am thinking about a more grace way, check the kthena ready before running any test, maybe can do that in func TestMain(m *testing.M) { after insatlled kthena.

volcano-sh-bot · 2026-03-17T09:04:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~test/OWNERS~~ [hzxuzhonghu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fix wait for webhook to be ready before running webhook tests

77cd3ba

Signed-off-by: katara-Jayprakash <katarajayprakash@icloud.com>

Copilot AI review requested due to automatic review settings March 15, 2026 20:00

volcano-sh-bot requested review from YaoZengzeng and hzxuzhonghu March 15, 2026 20:00

volcano-sh-bot added the size/M label Mar 15, 2026

Copilot started reviewing on behalf of katara-Jayprakash March 15, 2026 20:00 View session

gemini-code-assist bot reviewed Mar 15, 2026

View reviewed changes

Copilot AI reviewed Mar 15, 2026

View reviewed changes

test/e2e/controller-manager/test_suite_test.go Outdated

Comment on lines +130 to +131

ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute)

defer cancel()

Refactor my code as gemini suggested

54e7bb4

Signed-off-by: katara-Jayprakash <katarajayprakash@icloud.com>

LiZhenCheng9527 reviewed Mar 16, 2026

View reviewed changes

volcano-sh-bot assigned LiZhenCheng9527 Mar 16, 2026

volcano-sh-bot added the lgtm label Mar 16, 2026

hzxuzhonghu requested changes Mar 16, 2026

View reviewed changes

change webhook readiness from endpoint-based check to probe based

901f76e

Signed-off-by: katara-Jayprakash <katarajayprakash@icloud.com>

Copilot AI review requested due to automatic review settings March 16, 2026 14:44

volcano-sh-bot removed the lgtm label Mar 16, 2026

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Copilot started reviewing on behalf of katara-Jayprakash March 16, 2026 15:03 View session

Fixing parent ctx parameter input

bc7f4a7

Signed-off-by: katara-Jayprakash <katarajayprakash@icloud.com>

hzxuzhonghu approved these changes Mar 17, 2026

View reviewed changes

volcano-sh-bot assigned hzxuzhonghu Mar 17, 2026

volcano-sh-bot added the lgtm label Mar 17, 2026

volcano-sh-bot added the approved label Mar 17, 2026

volcano-sh-bot merged commit 64b0668 into volcano-sh:main Mar 17, 2026
14 checks passed

		// you are missing this line!
		require.NoError(t, err, "webhook did not become ready in time increase the waiting period")

	// you are missing this line!
	require.NoError(t, err, "webhook did not become ready in time increase the waiting period")
	require.NoError(t, err, "timed out waiting for webhook to become ready")

		ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute)
		defer cancel()

Conversation

katara-Jayprakash commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

katara-Jayprakash commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LiZhenCheng9527 left a comment

Choose a reason for hiding this comment

Uh oh!

hzxuzhonghu left a comment

Choose a reason for hiding this comment

Uh oh!

FAUST-BENCHOU commented Mar 16, 2026

Uh oh!

katara-Jayprakash commented Mar 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

katara-Jayprakash commented Mar 16, 2026

Uh oh!

hzxuzhonghu left a comment

Choose a reason for hiding this comment

Uh oh!

volcano-sh-bot commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

katara-Jayprakash commented Mar 15, 2026 •

edited

Loading

katara-Jayprakash commented Mar 15, 2026 •

edited

Loading