Skip to content

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Sep 16, 2025

We've had kind-alpha-beta-features and kind-beta-features jobs for a while. The original purpose was to run stable tests in a cluster with features enabled to detect when enabling those breaks stable functionality. Later the jobs were extended to also run all tests which should work in such a cluster. The kind-alpha-features job got removed recently because it's not necessarily a valid cluster configuration.

What this adds for the both cluster configs is:

  • "-enabled": running only tests for on-by-default features, i.e. excluding tests for stable features. This matches the original purpose of the jobs.
  • "-enabled-conformance": restricts the test selection even further to only conformance tests.

Both can eventually get promoted to release informing or even blocking.

Only presubmits get added for now. If testing the jobs in a trial PR works, the corresponding periodics can be added.

For reference:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 16, 2025
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/config Issues or PRs related to code in /config area/jobs area/testgrid sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Sep 16, 2025
@BenTheElder
Copy link
Member

/hold

AllAlpha=true without a corresponding AllBeta=true is not guaranteed to work because an alpha feature might have a dependency on an off-by-default beta feature.

For testing alpha features, sure. But we can cover that by exposing that the test has such dependencies and skipping them.

For starting a cluster, it should absolutely be safe to set AllAlpha, not use those features, and have GA / default features work fine.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 16, 2025
- name: RUNTIME_CONFIG
value: '{"api/alpha":"true", "api/ga":"true"}'
- name: LABEL_FILTER
value: "Feature: isSubsetOf OffByDefault && !BetaOffByDefault && !Deprecated && !Slow && !Disruptive && !Flaky"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we skip BetaOffByDefault here so that we do not need to remove this job

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is not which tests we run, it is the cluster config itself (see kubernetes/kubernetes#133697 (comment) and following).

@BenTheElder
Copy link
Member

I see this discussion now: kubernetes/kubernetes#133697 (comment)

I think we should retain "only test default features (which should be stable), but enable alphas (which shouldn't destabilize the cluster if unused)", but I'm not sure what to call that job.

It will instead be "enable all features".

@pohly
Copy link
Contributor Author

pohly commented Sep 16, 2025

I think we should retain "only test default features (which should be stable), but enable alphas (which shouldn't destabilize the cluster if unused)", but I'm not sure what to call that job.

Isn't that the "alpha-beta-enabled" variant that we discussed in https://kubernetes.slack.com/archives/C09QZ4DQB/p1749500650854679?thread_ts=1748885684.963169&cid=C09QZ4DQB? We never got around to adding those. I didn't do it at that time and then had the impression that you were working on it together with some communication to the project about this.

We can also do "beta-enabled". We can't do "alpha-enabled", for the same reason that we cannot do "alpha-features" (potentially invalid cluster config).

@BenTheElder
Copy link
Member

We can also do "beta-enabled". We can't do "alpha-enabled", for the same reason that we cannot do "alpha-features" (potentially invalid cluster config).

yeah. I think the most value will be the alpha-beta-enabled but we should also do beta-enabled.

Also commented back on the main thread, I don't know what to call it without being a mouthful but something like alpha-beta-conformance in the way we have ga-only conformance, I think breaking conformance tests by merely enabling these is a red flag and maybe that's a better target for a periodic than just alpha-enabled + default presubmits

@pohly
Copy link
Contributor Author

pohly commented Sep 17, 2025

The difference between alpha-beta-enabled would be that it runs all GA tests, whereas alpha-beta-conformance is limited to conformance tests, right?

We can do both, but I am not sure how useful alpha-beta-conformance is, assuming that we also monitor alpha-beta-enabled breakage. Both depend on alpha-level functionality, even if they don't test it, and thus cannot become release blocking. They can both be release informing, and then alpha-beta-conformance is redundant.

Coming back to this PR: can we merge it now and then add the missing jobs separately, or do you want me to turn this PR into the overall "clean up feature gate testing jobs" PR?

@BenTheElder
Copy link
Member

The difference between alpha-beta-enabled would be that it runs all GA tests, whereas alpha-beta-conformance is limited to conformance tests, right?

s/GA/on-by-default/g, and not all, typically we would be skipping serial and slow for example, whereas conformance does not. they have overlap.

We can do both, but I am not sure how useful alpha-beta-conformance is, assuming that we also monitor alpha-beta-enabled breakage. Both depend on alpha-level functionality, even if they don't test it, and thus cannot become release blocking.

We have no rule that says this. And anyhow, enabling it should be safe.

Regardless, new jobs must start by proposing to informing, not blocking yet.

They can both be release informing, and then alpha-beta-conformance is redundant.

I don't follow why it's redundant, conformance is largely a superset of the tests we run in presubmit. It may make more sense to just have the conformance version though.

@aojea
Copy link
Member

aojea commented Sep 17, 2025

oh, I totally forgot that to do alpha, beta feature jobs we also added the capacity to run the e2e tests with that specific labels.
Having the twin job running only conformance has demonstrated to be very efficient to discard issues of stability caused by bad tests for helping release-signal folks to correlate issues, i.e. if kind-alpha-beta fails constantly but kind-alpha-beta-conformance does not, it has to be something related to some specific test and not the job

@pohly
Copy link
Contributor Author

pohly commented Sep 18, 2025

s/GA/on-by-default/g, and not all, typically we would be skipping serial and slow for example,

Ack to on-by-default. That typically means tests for beta features, and with "beta is the new GA" that means we would run tests that really should pass. Test failures in "alpha-beta-enabled" (all features enabled, run on-by-default tests) need the same attention as failures in "alpha-beta-conformance" (all features enabled, run only conformance tests).

I don't see why we should skip serial or slow in an "alpha-beta-enabled" CI job. If we have such tests, we should run them. I would also include them in an optional presubmit. The only situation where excluding them may make sense is in a presubmit which always runs, to keep resource usage lower and to avoid delays.

Having the twin job running only conformance has demonstrated to be very efficient to discard issues of stability caused by bad tests for helping release-signal folks to correlate issues

Providing "alpha-beta-conformance" as a slightly more stable (because less tests) version of "alpha-beta-enabled" for SIG release may make sense. It all depends on who is going to monitor those jobs.

So shall I include the new jobs here or not?

@pohly
Copy link
Contributor Author

pohly commented Sep 24, 2025

So shall I include the new jobs here or not?

The kind-alpha-features jobs got removed through a different PR. As we started the discussion around them (again...) here, let's repurpose this PR to add the missing jobs. I've started with presubmits for now and will do periodics next if this works out okay.

@pohly pohly changed the title sig-testing: remove alpha job sig-testing: alpha/beta "enabled" and "conformance" presubmits Sep 24, 2025
@pohly pohly force-pushed the alpha-job-removal branch 3 times, most recently from d0c3fa8 to 212dcd7 Compare September 24, 2025 16:57
Copy link
Member

@BenTheElder BenTheElder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 24, 2025
We've had kind-alpha-beta-features and kind-beta-features jobs for a while. The
original purpose was to run stable tests in a cluster with features enabled to
detect when enabling those breaks stable functionality. Later the jobs were
extended to also run all tests which should work in such a cluster. The
kind-alpha-features job got removed recently because it's not necessarily a
valid cluster configuration.

What this adds for the both cluster configs is:
- "-enabled": running only tests for on-by-default features, i.e. excluding tests for
  stable features. This matches the original purpose of the jobs.
- "-enabled-conformance": restricts the test selection even further to only
  conformance tests.

Both can eventually get promoted to release informing or even blocking.

Only presubmits get added for now. If testing the jobs in a trial PR works, the
corresponding periodics can be added.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 25, 2025
@pohly
Copy link
Contributor Author

pohly commented Sep 25, 2025

I made the same change as in #35575 (disable the SKIP default): https://github.com/kubernetes/test-infra/compare/212dcd7c76b7f31e94c8614c7742abae45d63829..04f210d27d9f0c864998a8d4b2d804dc68be8bdf

@aojea: can you perhaps LGTM + approve both this and the other PR? I'd like to make some progress on this.

@aojea
Copy link
Member

aojea commented Sep 25, 2025

absolutely
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 25, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aojea
Copy link
Member

aojea commented Sep 25, 2025

/hold cancel

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Sep 25, 2025
@k8s-ci-robot k8s-ci-robot merged commit da4de6a into kubernetes:master Sep 25, 2025
6 checks passed
@k8s-ci-robot
Copy link
Contributor

@pohly: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key kubernetes-kind-presubmits.yaml using file config/jobs/kubernetes/sig-testing/kubernetes-kind-presubmits.yaml

In response to this:

We've had kind-alpha-beta-features and kind-beta-features jobs for a while. The original purpose was to run stable tests in a cluster with features enabled to detect when enabling those breaks stable functionality. Later the jobs were extended to also run all tests which should work in such a cluster. The kind-alpha-features job got removed recently because it's not necessarily a valid cluster configuration.

What this adds for the both cluster configs is:

  • "-enabled": running only tests for on-by-default features, i.e. excluding tests for stable features. This matches the original purpose of the jobs.
  • "-enabled-conformance": restricts the test selection even further to only conformance tests.

Both can eventually get promoted to release informing or even blocking.

Only presubmits get added for now. If testing the jobs in a trial PR works, the corresponding periodics can be added.

For reference:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

pohly added a commit to pohly/test-infra that referenced this pull request Oct 1, 2025
This mirrors the presubmit counterparts that were added in
kubernetes#35534 and tested in
kubernetes/kubernetes#134250.

I can help monitor the existing alpha/beta features jobs. Those need to be
renamed in testgrid to avoid confusion. Other than that they remain unchanged
for now.

Potential future work:

- Changes related to serial and/or slow jobs.

  Serial tests are excluded implicitly by e2e-k8s.sh because the jobs enables
  PARALLEL (kubernetes#35594).

  Slow jobs are disabled in LABEL_FILTER because that is what the existing periodics
  did. We might be able to run them because as long as they overlap with other
  tests there shouldn't be much impact on overall job duration (same applies
  to presubmits!). Scheduling of slow tests may be relevant
  (onsi/ginkgo#1599).

- Release informing/blocking.

  The existing jobs are release informing. alpha-beta-features shouldn't be
  because breaking alpha tests is not something that the release team should
  have to deal with. Instead, the new jobs should get promoted once they
  are known to be stable. beta-features can remain release informing,
  tests for beta features (even if off-by-default) need to be stable.

- Decision about "enabled-conformance".

  The conformance jobs got included because it was suggested on Slack.
  They run a subset of the tests run by their "enabled" counterparts.
  It remains to be seen whether having two jobs instead of one really
  provides a better release signal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs area/testgrid cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants