Skip to content

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Sep 24, 2025

The implicit default for SKIP is '[Serial]'. That's surprising ("what you see is not what you get") and unnecessary for these jobs. If we have serial tests, then we can and should run them to get full coverage.

There is a risk that too many serial tests slow down the overall job. If that is the case, then skipping serial tests should be added back inside the label filter. Before https://github.com/kubernetes/test-infra/pull/35570/files (merged today), serial tests were enabled.

FOCUS doesn't have a default in these jobs, but elsewhere it defaults to '[Conformance]'. It's safer to explicitly set it - perhaps someone does a copy-and-paste of these settings.

This was noticed after reviewing the result of the jobs without the previous legacy SKIP:

/assign @aojea

The implicit default for SKIP is '\[Serial\]'. That's surprising ("what you see
is *not* what you get") and unnecessary for these jobs. If we have serial tests,
then we can and should run them to get full coverage.

There is a risk that too many serial tests slow down the overall job. If that
is the case, then skipping serial tests should be added back inside the label
filter.

FOCUS doesn't have a default in these jobs, but elsewhere it defaults to
'[Conformance]'. It's safer to explicitly set it - perhaps someone does a
copy-and-paste of these settings.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 24, 2025
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Sep 24, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
Once this PR has been reviewed and has the lgtm label, please assign cblecker for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/config Issues or PRs related to code in /config area/jobs sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Sep 24, 2025
- name: RUNTIME_CONFIG
value: '{"api/beta":"true", "api/ga":"true"}'
- name: SKIP
value: ""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

This does not actually work 😠

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/134250/pull-kubernetes-e2e-kind-alpha-beta-enabled/1971169750193016832 used it and still got '--ginkgo.skip=[Serial]' added.

Whatever does that is missing a "variable already set" check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it's the script in kind, but we also have the overly complex ginkgo-e2e.sh in k/k, I still want to do kubernetes/kubernetes#131058 but we had more pending discussion it seemed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/close

Replaced by #35594.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 25, 2025
@aojea
Copy link
Member

aojea commented Sep 25, 2025

one of the goals of kind is to run fast, that means you get ~1100 tests in ~ 23 mins, if you run them serial it is ~ 2 hours for ~ 400 tests, plus you don;t get coverage on the impact of features on other features or test in other tests, Serial is a very happy scenario.

So, Serial should not even exist, other thing is that a few test require them for reasons, but we should get coverage on parallel with high parallism first and later serial, for completeness

@pohly
Copy link
Contributor Author

pohly commented Sep 25, 2025

I fully agree, we should have very few tests which need serial, ideally none.

But if we have them for reasons that were justified, then these jobs should run them, and the choice whether they run should be left to the job definition, not some script somewhere in the bowls of our test machinery.

FWIW, serial tests were included in these jobs until I removed the obsolete SKIP, for example:

https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/133648/pull-kubernetes-e2e-kind-alpha-beta-features/1965985578365227008/build-log.txt

EDIT: I was wrong (eyes a bit tired): there is \[Serial\] in --ginkgo.skip=\[Serial\]|PodSecurityPolicy|....

@pohly
Copy link
Contributor Author

pohly commented Sep 25, 2025

So the source of skipping serial is this:
https://github.com/kubernetes-sigs/kind/blob/d1eecc46e30cac9d35cd32dc52677ef75ec22e18/hack/ci/e2e-k8s.sh#L226-L234

I suspect that comes from the times when Ginkgo didn't support running serial tests sequentially in a parallel runs. If true, then this was a workaround for a technical limitation, not a conscious choice because of runtime concerns. Whether a job is parallel or serial is not a good indicator for "must run quickly" vs. "may take longer".

@pohly
Copy link
Contributor Author

pohly commented Sep 25, 2025

Let me try to formulate some basic principles. I'll use separate comments, so you can upvote/downvote them individually.

cc @BenTheElder

@pohly
Copy link
Contributor Author

pohly commented Sep 25, 2025

  • It should be obvious from a job spec which tests are enabled.

@pohly
Copy link
Contributor Author

pohly commented Sep 25, 2025

  • As of Ginkgo v2, -p supports running serial tests sequentially and therefore enabling/disabling parallel execution in a job is orthogonal to test selection. In other words, "PARALLEL=true" does not imply "skip serial tests".

@pohly
Copy link
Contributor Author

pohly commented Sep 25, 2025

  • If a presubmit runs infrequently and is only invoked to test certain aspects, then including serial tests is desirable to get full test coverage of that aspect.

@pohly
Copy link
Contributor Author

pohly commented Sep 25, 2025

  • If a presubmit runs always, serial tests should be excluded.

@pohly
Copy link
Contributor Author

pohly commented Sep 25, 2025

  • If serial tests are excluded in a presubmit, then an on-demand alternative version which runs the skipped serial tests is desirable. Otherwise it's impossible to test changes to those tests before merging.

@pohly
Copy link
Contributor Author

pohly commented Sep 25, 2025

  • Periodic jobs should always run all supported tests, including the serial ones.

@k8s-ci-robot
Copy link
Contributor

@pohly: Closed this PR.

In response to this:

/close

Replaced by #35594.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/config Issues or PRs related to code in /config area/jobs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants