Correct Github Actions CI instability for iOS. #2337
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The iOS test suite added by #2286 has proven to be unstable in GitHub Actions CI, failing ~50% of the time. On deeper investigation, the issue appears to be CPU overloading on the CI machine.
The integration tests run multhreaded using xdist; the ARM64 test machine reports 3 CPUs, and so 3 test threads are started. The iOS tests are isolated into a load group, resulting in the iOS tests being isolated to a single worker; but at the same time, the other two CPUs are allocated other macOS tests to run. The iOS test is itself a multi-process, multi-threaded build, calling Xcode, the iOS Simulator, and the macOS logging infrastructure; when the 2 remaining processes are at 100% utilisation building other tests, it's possible for the process of compiling the iOS project and booting a simulator clone to take a long time. The iOS testbed currently has a hard-coded 5 minute timeout waiting for Xcode to compile the project and boot a simulator; this is the timeout that is causing test failures.
This PR makes two changes to fix this.
Firstly, it splits the integration testbed into 2 phases. Instead of using an iOS load group, a pytest marker is used to identify the iOS tests as "serial" tests. The test suite then runs the integration test suite in two parts: all the serial tests are executed single process; and the non-serial tests are then executed multi-process. This effectively means that all the iOS tests run sequentially, then the rest of the test suite runs in parallel (as it always has done).
Secondly, the macOS ARM64 builds have been updated to use the macOS-15 runner. The macOS-15 runner is currently listed as "beta"; but it has been available for almost 6 months, and if history is any indication, will become the default runner in the very near future. The macOS-15 runner has two notable improvements:
test_ios.py::test_ios_platforms
run in as little as 94 seconds, with the complete macOS test suite completing in 22 minutes - down from 36 minutes on the macOS-14 runner.Apple "best practice" strongly encourages developers to keep on the "stable bleeding edge" of Xcode tooling, so upgrading to use Xcode 16 is generally a good idea anyway. It is possible to use Xcode 16 on a macOS-14 base image - but due to (1), the performance is still much worse than the macOS-15 runner (the worst iOS test execution time I've seen is 392s). Switching to macOS-15 also means that we don't need to explicitly maintain the version of Xcode, as the macOS-15 runner will always use the latest version of Xcode 16 that has been published.
Fixes #2335 - It's obviously difficult to categorically prove this, but I've run 4 builds on the macOS-15 runner with serial test isolation, with test execution times ranging from 94-215s 1. That time includes installing the compiled app on the simulator and running the test - so it's finishing well under the 5 minute/300s timeout that was causing test failures. I've also run 3 successful builds on macOS-14 with Xcode 16.2; these have much worse build times (332-392s), which is more than 5 minutes - but again, includes the time to install and run the test suite, which is easily 1/3-1/2 of the overall test time. Those tests have been run during US PST business hours and during AU AWST business hours, which limits the possible influence of "time of day" and overall system load on the problem.
Footnotes
If you want to audit the test results, the CI runs of interest are the last 7 commits attached to [DO NOT MERGE] Evaluate iOS CI reliability issues. #2336. The full CI runs report as fails because the CircleCI configuration fails - but that's a false positive because of a bad configuration disabling the use of CircleCI. The Github macOS-13 and macOS-14/15 runners are the only results of significance. ↩