Skip to content

Conversation

@mayuka-c
Copy link
Contributor

@mayuka-c mayuka-c commented Oct 17, 2025

Fixes: #5137

By looking at the logs, looks like during the curl metrics pod creation, calls to the webhook is failing due to connection refused. So added precheck to ensure the webhook is ready before beginning with the pod creation.

Validation

Ran the e2e tests multiple times here and did not see the flakes: https://github.com/mayuka-c/kubebuilder/actions/runs/18582517449

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 17, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @mayuka-c. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mayuka-c
Once this PR has been reviewed and has the lgtm label, please assign camilamacedo86 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mayuka-c mayuka-c changed the title 🌱 Fixing flakes in e2e tests for metrics 🌱 Fix flakes in e2e tests for metrics Oct 17, 2025
@mayuka-c mayuka-c force-pushed the issues-5137 branch 2 times, most recently from f130aac to 1c32309 Compare October 21, 2025 01:59
Copy link
Member

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mayuka-c 👋

That looks great! However, this change only makes sense for projects that actually have webhooks.
Not all projects do — we use markers to inject code (see Kubebuilder markers
).
For example:

+kubebuilder:scaffold:e2e-webhooks-checks

Then, if we really need that we will need to see if we can use the same; it seems not. Then we need a new marker. So, we need to be 1000% sure that is the best way. Ideally we should avoid create new markers and etc since we need keep the support for those, maker harder for end user customisartions, etc.

So, if we need to check whether the webhook is serving before the metrics, we should only add this logic when webhooks are present — meaning we’ll need to use a marker for it.

Also, please make sure the scaffolded code remains as generic as possible, so it still works even if users customise their configurations.

Lastly, since this change affects the scaffolded e2e tests in projects (which impact end users), we need to be careful with the commit type.
We should use the 🐛 emoji instead of 🌱 — the seed emoji is only used for changes that don’t affect users and can be ignored in the release notes.

@mayuka-c
Copy link
Contributor Author

Hi @camilamacedo86
Regarding below:-

That looks great! However, this change only makes sense for projects that actually have webhooks. Not all projects do — we use markers to inject code (see Kubebuilder markers ). For example:

+kubebuilder:scaffold:e2e-webhooks-checks

Then, if we really need that we will need to see if we can use the same; it seems not. Then we need a new marker. So, we need to be 1000% sure that is the best way. Ideally we should avoid create new markers and etc since we need keep the support for those, maker harder for end user customisartions, etc.

So, if we need to check whether the webhook is serving before the metrics, we should only add this logic when webhooks are present — meaning we’ll need to use a marker for it.

Also, please make sure the scaffolded code remains as generic as possible, so it still works even if users customise their configurations.

If I'm not wrong, I could add a new fragment to do the webhook readiness check here and replace with the marker here

But the only problem that I see is, it will end up running other fragment checks too which is not required.

Shall I go ahead to define a new marker? or if you have any better solution, please do suggest :)

@mayuka-c
Copy link
Contributor Author

Lastly, since this change affects the scaffolded e2e tests in projects (which impact end users), we need to be careful with the commit type. We should use the 🐛 emoji instead of 🌱 — the seed emoji is only used for changes that don’t affect users and can be ignored in the release notes.

Sorry my bad, I will update the title with proper emoji. Thank you :)

@mayuka-c mayuka-c changed the title 🌱 Fix flakes in e2e tests for metrics 🐛 Fix flakes in e2e tests for metrics Oct 21, 2025
@camilamacedo86
Copy link
Member

Hi @mayuka-c

Let's create a new marker
Also, if you can review the code and ensure that is generic and good enough for any project that would be great 🎉

@mayuka-c
Copy link
Contributor Author

Hi @mayuka-c

Let's create a new marker Also, if you can review the code and ensure that is generic and good enough for any project that would be great 🎉

Sure thank you @camilamacedo86 . I will review the code and make sure it is generic.

@mayuka-c mayuka-c force-pushed the issues-5137 branch 2 times, most recently from b703a44 to d058ca9 Compare October 30, 2025 12:07
@mayuka-c
Copy link
Contributor Author

Hi @camilamacedo86 👋
Sorry for getting back late, as I was bust with other things.

I have added a new marker e2e-webhook-readiness and it seems to working fine.
I see the readiness check test cases are used in projects where the webhooks are present in Projectfile and for the basic test-data project, the webhook tests are not added.

Please let me know if this is fine or if I'm missing

Fix flakes - 2

Run make generate

RUn make generate

Minor fix

Minor fix

Fix-2

Fix-3

Implement a new marker for webhook readiness

Lint fix

scaffold md fix
g.Expect(err).NotTo(HaveOccurred(), "Webhook server not responding on port 443")
}
Eventually(verifyWebhookServiceReady, 2*time.Minute).Should(Succeed())
// +kubebuilder:scaffold:e2e-webhooks-readiness
Copy link
Member

@camilamacedo86 camilamacedo86 Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may can simplify with

By("waiting for webhook service to be ready if webhooks are configured")
verifyWebhookServiceReady := func(g Gomega) {
	const webhookServiceName = "{{ProjectName}}-webhook-service"

	cmd := exec.Command("kubectl", "get", "service", webhookServiceName, "-n", namespace)
	_, err := utils.Run(cmd)
	if err != nil {
		// Project has no webhook service; nothing to wait on.
		return
	}

	cmd = exec.Command("kubectl", "get", "pods", "-l", "control-plane=controller-manager",
		"-n", namespace, "-o", "jsonpath={.items[0].status.conditions[?(@.type=='Ready')].status}")
	output, err := utils.Run(cmd)
	g.Expect(err).NotTo(HaveOccurred())
	g.Expect(output).To(Equal("True"),
		"Controller manager pod not ready (webhook server may not be accepting connections)")

	cmd = exec.Command("kubectl", "get", "endpoints", webhookServiceName,
		"-n", namespace, "-o", "jsonpath={.subsets[*].addresses[*].ip}")
	output, err = utils.Run(cmd)
	g.Expect(err).NotTo(HaveOccurred())
	g.Expect(output).NotTo(BeEmpty(), "Webhook service endpoints are not ready")

	cmd = exec.Command("kubectl", "get", "--raw",
		fmt.Sprintf("/api/v1/namespaces/%s/services/https:%[2]s:https/proxy/readyz", namespace, 		webhookServiceName))
	_, err = utils.Run(cmd)
	g.Expect(err).NotTo(HaveOccurred(), "Webhook server not responding on port 443")
}
Eventually(verifyWebhookServiceReady, 2*time.Minute).Should(Succeed())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT>

@camilamacedo86 camilamacedo86 self-requested a review October 30, 2025 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flakes in e2e tests for metrics

3 participants