Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct Github Actions CI instability for iOS. #2337

Merged
merged 2 commits into from
Mar 28, 2025

Conversation

freakboy3742
Copy link
Contributor

The iOS test suite added by #2286 has proven to be unstable in GitHub Actions CI, failing ~50% of the time. On deeper investigation, the issue appears to be CPU overloading on the CI machine.

The integration tests run multhreaded using xdist; the ARM64 test machine reports 3 CPUs, and so 3 test threads are started. The iOS tests are isolated into a load group, resulting in the iOS tests being isolated to a single worker; but at the same time, the other two CPUs are allocated other macOS tests to run. The iOS test is itself a multi-process, multi-threaded build, calling Xcode, the iOS Simulator, and the macOS logging infrastructure; when the 2 remaining processes are at 100% utilisation building other tests, it's possible for the process of compiling the iOS project and booting a simulator clone to take a long time. The iOS testbed currently has a hard-coded 5 minute timeout waiting for Xcode to compile the project and boot a simulator; this is the timeout that is causing test failures.

This PR makes two changes to fix this.

Firstly, it splits the integration testbed into 2 phases. Instead of using an iOS load group, a pytest marker is used to identify the iOS tests as "serial" tests. The test suite then runs the integration test suite in two parts: all the serial tests are executed single process; and the non-serial tests are then executed multi-process. This effectively means that all the iOS tests run sequentially, then the rest of the test suite runs in parallel (as it always has done).

Secondly, the macOS ARM64 builds have been updated to use the macOS-15 runner. The macOS-15 runner is currently listed as "beta"; but it has been available for almost 6 months, and if history is any indication, will become the default runner in the very near future. The macOS-15 runner has two notable improvements:

  1. The build machines are significantly faster. I've seen a single test_ios.py::test_ios_platforms run in as little as 94 seconds, with the complete macOS test suite completing in 22 minutes - down from 36 minutes on the macOS-14 runner.
  2. The macOS-15 runner defaults to using Xcode 16, whereas the macOS-14 runner defaults to Xcode 15. Xcode 15 had a number of issues with slow simulator startup; the 15.0, 15.1 and 15.2 releases were unusably slow. The current 15.4 image is better, but not as good as Xcode 16.

Apple "best practice" strongly encourages developers to keep on the "stable bleeding edge" of Xcode tooling, so upgrading to use Xcode 16 is generally a good idea anyway. It is possible to use Xcode 16 on a macOS-14 base image - but due to (1), the performance is still much worse than the macOS-15 runner (the worst iOS test execution time I've seen is 392s). Switching to macOS-15 also means that we don't need to explicitly maintain the version of Xcode, as the macOS-15 runner will always use the latest version of Xcode 16 that has been published.

Fixes #2335 - It's obviously difficult to categorically prove this, but I've run 4 builds on the macOS-15 runner with serial test isolation, with test execution times ranging from 94-215s 1. That time includes installing the compiled app on the simulator and running the test - so it's finishing well under the 5 minute/300s timeout that was causing test failures. I've also run 3 successful builds on macOS-14 with Xcode 16.2; these have much worse build times (332-392s), which is more than 5 minutes - but again, includes the time to install and run the test suite, which is easily 1/3-1/2 of the overall test time. Those tests have been run during US PST business hours and during AU AWST business hours, which limits the possible influence of "time of day" and overall system load on the problem.

Footnotes

  1. If you want to audit the test results, the CI runs of interest are the last 7 commits attached to [DO NOT MERGE] Evaluate iOS CI reliability issues. #2336. The full CI runs report as fails because the CircleCI configuration fails - but that's a false positive because of a bad configuration disabling the use of CircleCI. The Github macOS-13 and macOS-14/15 runners are the only results of significance.

Copy link
Contributor

@joerick joerick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for debugging this @freakboy3742 ! And for the great writeup! Debugging flaky tests requires a lot of patience. I hope this does the trick!

Copy link
Contributor

@henryiii henryiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little surprised there's not some feature in xdist that would help with this, but didn't see one. Thanks!

@henryiii henryiii merged commit 4ef7b37 into pypa:main Mar 28, 2025
24 checks passed
@freakboy3742 freakboy3742 deleted the ios-timeout-fix branch March 29, 2025 03:35
@freakboy3742
Copy link
Contributor Author

freakboy3742 commented Mar 29, 2025

Thank you very much for debugging this @freakboy3742 ! And for the great writeup! Debugging flaky tests requires a lot of patience. I hope this does the trick!

No problems - and apologies for the inconvenience while the suite was flaky.

A little surprised there's not some feature in xdist that would help with this, but didn't see one. Thanks!

Likewise - I looked for the same thing, and came up empty. It definitely seems like it could be a useful feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instability in iOS tests under CI conditions
3 participants