Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browser tests: reduce timeout for instantiation to reasonable value #1706

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

HeikoKlare
Copy link
Contributor

The timeout for browser instantiation in the browser tests has been increased to avoid flaky tests (via #1677). However, browser instantiation taking more than 10 seconds (the previous timeout value) is an indicator for something being wrong with the instantiation. There is no reason why instantiation may take that long during test execution but the same issue should never arise in productive use. Thus, such a case should be captured by tests and, if necessary, be fixed productively.

In order to do so, this change reduces the test timeout value again to a more reasonable value.

Related to #1676

Also see #1677 (comment)

Copy link
Contributor

github-actions bot commented Jan 6, 2025

Test Results

   501 files   -  1     501 suites   - 1   10m 43s ⏱️ - 5m 9s
 4 297 tests  - 37   4 285 ✅  - 35   11 💤  - 3  1 ❌ +1 
16 538 runs   - 37  16 432 ✅  - 34  105 💤  - 4  1 ❌ +1 

For more details on these failures, see this check.

Results for commit 9313572. ± Comparison against base commit cc07c36.

This pull request removes 37 tests.
org.eclipse.swt.graphics.ImageWin32Tests ‑ testImageDataForDifferentFractionalZoomsShouldBeDifferent
org.eclipse.swt.graphics.ImageWin32Tests ‑ testImageShouldHaveDimesionAsPerZoomLevel
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testByteArrayTransfer
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testFileTransfer
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testHtmlTransfer
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testImageTransfer_fromCopiedImage
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testImageTransfer_fromImage
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testImageTransfer_fromImageData
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testImageTransfer_fromImageDataFromImage
org.eclipse.swt.tests.win32.Test_org_eclipse_swt_dnd_DND ‑ testRtfTransfer
…

♻️ This comment has been updated with latest results.

@HeikoKlare HeikoKlare marked this pull request as ready for review January 6, 2025 13:58
@HeikoKlare HeikoKlare force-pushed the browser-test-initialization-timeout branch 5 times, most recently from a5eacf0 to 228c11e Compare January 14, 2025 14:15
@HeikoKlare HeikoKlare force-pushed the browser-test-initialization-timeout branch from 228c11e to 8cf3573 Compare January 14, 2025 18:58
@HeikoKlare
Copy link
Contributor Author

Builds sporadically show timeouts, e.g.:

Error:  Tests run: 380, Failures: 1, Errors: 0, Skipped: 12, Time elapsed: 193.3 s <<< FAILURE! -- in org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser
Error:  org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.test_BrowserFunction_callback[browser flags: 0] -- Time elapsed: 66.79 s <<< FAILURE!
java.lang.AssertionError: creating browser took too long: 65267ms
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.assertTrue(Assert.java:42)
	at org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.createBrowser(Test_org_eclipse_swt_browser_Browser.java:312)
	at org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.setUp(Test_org_eclipse_swt_browser_Browser.java:185)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)

Might be related to eclipse-platform/eclipse.platform.ui#2734

@HeikoKlare HeikoKlare force-pushed the browser-test-initialization-timeout branch from 8cf3573 to 194a8a0 Compare January 19, 2025 15:28
@HeikoKlare HeikoKlare force-pushed the browser-test-initialization-timeout branch 12 times, most recently from 8afbe01 to b74ebd7 Compare January 30, 2025 12:16
@fedejeanne fedejeanne force-pushed the browser-test-initialization-timeout branch from b74ebd7 to eda5bd4 Compare January 30, 2025 14:15
@fedejeanne
Copy link
Contributor

I rebased on master and solved the conflicts. Let's see if there are still (too many) test failures.

@HeikoKlare according to your comment (#1752 (comment)) this PR might need to be postponed until #1752 is merged, correct?

@HeikoKlare
Copy link
Contributor Author

This PR should actually not need any other PR at all. The failing tests indicate that there is a bug in either the browser implementation or the test setup that are not aware of yet, so something we need to find out independent from any other PR (as I already mentioned in #1677 (comment)).

@HeikoKlare HeikoKlare force-pushed the browser-test-initialization-timeout branch from eda5bd4 to 7a661ab Compare January 30, 2025 15:52
@fedejeanne
Copy link
Contributor

What I mean is that "if we merge this PR after the ones that are already open then it is to be expected that there are no/less test failures due to timeouts" . But I see that you want to use this PR as a basis to check if the other PRs actually help (reduce/get rid of the test failures because of timeouts and freezes).

So, shall we merge this one then?

@HeikoKlare
Copy link
Contributor Author

"if we merge this PR after the ones that are already open then it is to be expected that there are no/less test failures due to timeouts"

As we know, that's not the case. We still need to find the issue for the browser instantiation in tests taking so long. All the fixes we currently make regarding timeouts are only supposed to avoid that the actual errors (which we may not know yet) do not lead to a blocking UI. They do not properly solve the underlying issue.

So, shall we merge this one then?

I am not sure if people will be happy about that. Most Windows builds will fail with test failures then. So I wouldn't do that but rather find out the reasons for the failure first.

@HeikoKlare HeikoKlare force-pushed the browser-test-initialization-timeout branch 5 times, most recently from fd49490 to 47582ad Compare February 1, 2025 10:56
@HeikoKlare HeikoKlare force-pushed the browser-test-initialization-timeout branch 5 times, most recently from 0222db7 to 57b8f65 Compare February 1, 2025 19:20
@HeikoKlare
Copy link
Contributor Author

Note: the thread dump written after 60 seconds of test execution time show that the usually the main thread is stuck in display,sleep() / OS.WaitMessage() while waiting for browser instantiation:

[2025-02-01 17:15:07 +0000] org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.test_BrowserFunction_multiprocess[browser flags: 0]() ran for more than 60 seconds
totalMemory:             146800640
freeMemory (before GC):   96621696
freeMemory (after GC):    58666000
"main" prio=5 Id=1 RUNNABLE (in native)
	at app//org.eclipse.swt.internal.win32.OS.WaitMessage(Native Method)
	at app//org.eclipse.swt.widgets.Display.sleep(Display.java:4803)
	at app//org.eclipse.swt.browser.Edge.processOSMessagesUntil(Edge.java:471)
	at app//org.eclipse.swt.browser.Edge$WebViewProvider.getWebViewWrapper(Edge.java:370)
	at app//org.eclipse.swt.browser.Edge$WebViewProvider.getWebView(Edge.java:381)
	at app//org.eclipse.swt.browser.Edge.getUrl(Edge.java:894)
	at app//org.eclipse.swt.browser.Browser.getUrl(Browser.java:771)
	at app//org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.createBrowser(Test_org_eclipse_swt_browser_Browser.java:308)

One might think that there is some wake message, but the actual issue is that the OS message triggering the WebView instantiation callback did actually not arrive. When adding an explicit timed, repeated awakening, the display still goes into sleep repeatedly since no message arrived from the OS so far. In addition, adding debug output in the loop waiting for the OS message shows that the loop is spinned quite often.

Further finding: Once might also think that there is some message scheduled asynchronously on the display, which is not processed by the processOSMessagesUntil() (as it only processed OS messages) but may need to be processed (for whatever reason) to proceed with the processing the queue. In that case, whoever, display.sleep() would always returning immediately, as the display's synchronizer queue is not empty. Debug output also shows that this is usually not the case: the display properly goes to sleep in the scenarios where browser instantiation takes long, so there is actually nothing in the event queue (neither from the OS nor scheduled at the display by the application).

So the underlying issue is that the OS does not call the WebView instantiation callback for a long time (> 60 seconds). It is also interesting that usually after 65 seconds the callback is finally executed.

@HeikoKlare HeikoKlare force-pushed the browser-test-initialization-timeout branch 6 times, most recently from f6bb7be to c785da1 Compare February 3, 2025 20:35
@fedejeanne
Copy link
Contributor

fedejeanne commented Feb 4, 2025

I've also been looking at it and all I came up with is:

  1. It's true that the problem is the callback is simply being called after ~65s (the delay I've seen is pretty consistent, always between 65.2s and 65.7s).
  2. I've seen the callback take the ~65s delay "only" 1-2 times per CI-run (and if it does it twice, the 2nd time is usually in the next 30 seconds after the 1st one).

Regarding point nr. 1, I used this commit to log the request and the callback and I saw this output:

1st delay: 65.237.957.400 ns

1356349440500 - ICoreWebView2Environment.CreateCoreWebView2Controller()
...
1421587397900 - Edge.createControllerInitializationCallback()

**2nd delay: 65.262.087.700 ns **

1423716947100 - ICoreWebView2Environment.CreateCoreWebView2Controller()
...
1488979034800 - Edge.createControllerInitializationCallback()

A quick internet search revealed that Instantiating Edge can be slow, but what I still don't get is:

  • Why is it sometimes slow and not all the time ?
  • Why is the delay consistently something ~65s ?

@HeikoKlare let's discuss how to proceed with this PR in today's daily.

@fedejeanne
Copy link
Contributor

FYI there is a 2nd commit in this PR, one is from #1797

@HeikoKlare HeikoKlare marked this pull request as draft February 5, 2025 16:04
@HeikoKlare
Copy link
Contributor Author

Yes, that was intended for testing purposes. We still see issues with WebView intialization sometimes taking long, so this is not ready for being merged anyway. I thus changed it back to "draft".

@HeikoKlare HeikoKlare force-pushed the browser-test-initialization-timeout branch 3 times, most recently from 9bd1044 to 6843b3e Compare February 11, 2025 16:17
The timeout for browser instantiation in the browser tests has been
increased to avoid flaky tests. However, browser instantiation taking
that long is an indicator for something being wrong with the
instantiation. There is no reason why instantiation may take that long
during test execution but the same issue should never arise in
productive use, thus such a case should be captured by tests and, if
necessary, be fixed productively.

In order to do so, this change reduces the test timeout value again to a
more reasonable value. Since the first Edge browser instantiation takes
rather long at least in the GitHub actions test environment, that
instantiation is isolated in an execution before the test class.

Related to
eclipse-platform#1676
@HeikoKlare HeikoKlare force-pushed the browser-test-initialization-timeout branch from 6843b3e to 9313572 Compare February 13, 2025 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants