Bump iOS XCTest timeout for ExecuTorchLLMTests by psiddh · Pull Request #19354 · pytorch/executorch

psiddh · 2026-05-07T00:12:41Z

Summary:
The 13 XCTestCase methods in
xplat/executorch/extension/llm/apple:ExecuTorchLLMTests
(testLLaMA, testPhi4, testGemma, testLLaVA, testVoxtral and their
reset variants) regularly hit the 1800-second per-test ceiling
enforced by fbobjc/Tools/xctest_runner for the long_running
label. LLM inference on iOS-sim CPU (1B-class models,
128-768 token sequences, each test calls generate() twice)
routinely exceeds 30 minutes per test method, producing spurious
"Test timed out after 1800 seconds" flakes on the test-issues
dashboard for owner ai_infra_mobile_platform.

Per the runner formula
TEST_CASE_TIMEOUT(60s) * label_multiplier * 3:

label	multiplier	per-XCTestCase budget
long_running	x10	1800s
glacial (here)	x30	5400s

Switching to glacial (the highest tier supported by the runner)
gives each test 90 minutes. Adding
test_test_rule_timeout_ms = 14400000 sets the bundle-level
wall-clock budget to 4h, which is comfortable headroom for ~5
testcases at 90 min each plus xctest setup/teardown.

Note: this diff is unrelated to T269848646. T269848646 tracks a
separate cluster of 446 iOS-sim test-run cancellations
(duration: 0.00, "test execution was cancelled because the test
run was cancelled") that is owned by testinfra and is not
addressed here.

Reviewed By: shoumikhin

Differential Revision: D104147313

pytorch-bot · 2026-05-07T00:12:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19354

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 17 Unrelated Failures

As of commit 879d3a8 with merge base af90130 ():

NEW FAILURE - The following job has failed:

pull / unittest-editable / macos / macos-job (gh)
backends/xnnpack/test/ops/test_conv2d.py::TestConv2d::test_fp16_conv2d

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-lora-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-lora-multimethod-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-models-linux (ic4, portable, linux.4xlarge.memory) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-models-linux (ic4, xnnpack-quantization-delegation, linux.4xlarge.memory) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-models-linux (mobilebert, portable, linux.2xlarge) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-models-linux (mobilebert, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-models-linux (phi_4_mini, portable, linux.4xlarge.memory) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-moshi-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-phi-3-mini-runner-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-sqnr-static-llm-qnn-linux (smollm2_135m) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-voxtral-realtime-xnnpack-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-vulkan-models-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest / linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest / macos / macos-job (gh) (trunk failure)
[ FAILED ] OpGridSampler2dTest.BatchSizeMismatchDies
pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-05-07T00:12:49Z

@psiddh has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104147313.

github-actions · 2026-05-07T00:13:32Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This PR adjusts the Buck test configuration for the iOS LLM XCTest bundle to reduce spurious timeouts during long CPU-based simulator inference runs.

Changes:

Switches the test label from long_running to glacial to increase the per-XCTestCase timeout tier.
Sets a larger rule-level wall-clock timeout for the generated test bundle via test_test_rule_timeout_ms.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Summary: The 13 XCTestCase methods in `xplat/executorch/extension/llm/apple:ExecuTorchLLMTests` (testLLaMA, testPhi4, testGemma, testLLaVA, testVoxtral and their reset variants) regularly hit the 1800-second per-test ceiling enforced by `fbobjc/Tools/xctest_runner` for the `long_running` label. LLM inference on iOS-sim CPU (1B-class models, 128-768 token sequences, each test calls `generate()` twice) routinely exceeds 30 minutes per test method, producing spurious "Test timed out after 1800 seconds" flakes on the test-issues dashboard for owner `ai_infra_mobile_platform`. Per the runner formula `TEST_CASE_TIMEOUT(60s) * label_multiplier * 3`: | label | multiplier | per-XCTestCase budget | |----------------|-----------:|----------------------:| | long_running | x10 | 1800s | | glacial (here) | x30 | 5400s | Switching to `glacial` (the highest tier supported by the runner) gives each test 90 minutes. Adding `test_test_rule_timeout_ms = 28800000` sets the bundle-level wall-clock budget to 8h, which is comfortable headroom for ~5 testcases at 90 min each plus xctest setup/teardown. Note: this diff is unrelated to T269848646. T269848646 tracks a separate cluster of 446 iOS-sim test-run *cancellations* (`duration: 0.00`, "test execution was cancelled because the test run was cancelled") that is owned by testinfra and is not addressed here. Differential Revision: D104147313

Summary: The 13 XCTestCase methods in `xplat/executorch/extension/llm/apple:ExecuTorchLLMTests` (testLLaMA, testPhi4, testGemma, testLLaVA, testVoxtral and their reset variants) regularly hit the 1800-second per-test ceiling enforced by `fbobjc/Tools/xctest_runner` for the `long_running` label. LLM inference on iOS-sim CPU (1B-class models, 128-768 token sequences, each test calls `generate()` twice) routinely exceeds 30 minutes per test method, producing spurious "Test timed out after 1800 seconds" flakes on the test-issues dashboard for owner `ai_infra_mobile_platform`. Per the runner formula `TEST_CASE_TIMEOUT(60s) * label_multiplier * 3`: | label | multiplier | per-XCTestCase budget | |----------------|-----------:|----------------------:| | long_running | x10 | 1800s | | glacial (here) | x30 | 5400s | Switching to `glacial` (the highest tier supported by the runner) gives each test 90 minutes. Adding `test_test_rule_timeout_ms = 14400000` sets the bundle-level wall-clock budget to 4h, which is comfortable headroom for ~5 testcases at 90 min each plus xctest setup/teardown. Note: this diff is unrelated to T269848646. T269848646 tracks a separate cluster of 446 iOS-sim test-run *cancellations* (`duration: 0.00`, "test execution was cancelled because the test run was cancelled") that is owned by testinfra and is not addressed here. Reviewed By: shoumikhin Differential Revision: D104147313

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

+    # Rule-level wall-clock for the whole auto-generated test bundle:
+    # ExecuTorchLLMTests currently contains 13 XCTestCase methods, and
+    # individual methods can exceed 30 minutes on iOS-sim CPU. This 4h
+    # budget is intended as the total bundle/shard wall-clock, including


Copilot AI review requested due to automatic review settings May 7, 2026 00:12

psiddh requested review from larryliu0820 and mergennachin as code owners May 7, 2026 00:12

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 7, 2026

meta-codesync Bot added fb-exported meta-exported labels May 7, 2026

Copilot started reviewing on behalf of psiddh May 7, 2026 00:13 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

psiddh force-pushed the export-D104147313 branch from 1a14fa8 to 24581b5 Compare May 7, 2026 00:18

meta-codesync Bot changed the title ~~Bump iOS XCTest timeout for ExecuTorchLLMTests~~ Bump iOS XCTest timeout for ExecuTorchLLMTests (#19354) May 7, 2026

psiddh force-pushed the export-D104147313 branch from 24581b5 to b3b6a27 Compare May 7, 2026 00:45

psiddh requested a review from shoumikhin May 7, 2026 00:45

psiddh force-pushed the export-D104147313 branch from b3b6a27 to 611181f Compare May 7, 2026 00:48

shoumikhin approved these changes May 7, 2026

View reviewed changes

meta-codesync Bot changed the title ~~Bump iOS XCTest timeout for ExecuTorchLLMTests (#19354)~~ Bump iOS XCTest timeout for ExecuTorchLLMTests May 7, 2026

Copilot AI review requested due to automatic review settings May 7, 2026 03:31

psiddh force-pushed the export-D104147313 branch from 611181f to 46906d4 Compare May 7, 2026 03:31

Copilot started reviewing on behalf of psiddh May 7, 2026 03:32 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread extension/llm/apple/BUCK Outdated

Potential fix for pull request finding

879d3a8

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

Copilot AI review requested due to automatic review settings May 7, 2026 04:36

Copilot started reviewing on behalf of psiddh May 7, 2026 04:37 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump iOS XCTest timeout for ExecuTorchLLMTests#19354

Bump iOS XCTest timeout for ExecuTorchLLMTests#19354
psiddh wants to merge 2 commits intopytorch:mainfrom
psiddh:export-D104147313

psiddh commented May 7, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented May 7, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

psiddh commented May 7, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19354

❌ 1 New Failure, 17 Unrelated Failures

Uh oh!

meta-codesync Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

psiddh commented May 7, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented May 7, 2026 •

edited

Loading

This PR needs a `release notes:` label