Flaky hanging tests after merging #54 #66

llucax · 2024-10-01T06:54:33Z

What happened?

Pull request #54 introduced a timing issue with tests, making them flaky in amd64 but probably consistently failing in arm64 because the CI runs on qemu, which is extremely sloooooowwww.

It is probably related to the dependency bump of client-dispatch which in turn bumps the dependency of channels, which has a change in how Timers work.

There has been some investigation already done mainly by @Marenz, but it seems we'll need to spend some more time on this to find the root cause.

The issue seems to be that some condition variable is run in a different loop than the one it was created:

    | RuntimeError: <asyncio.locks.Condition object at 0x7f3d2bac0e50 [unlocked]> is bound to a different event loop

What did you expect instead?

Tests should run normally.

Affected version(s)

No response

Affected part(s)

Unit, integration and performance tests (part:tests)

Extra information

Here is a capture of logs when this happens: https://gist.github.com/Marenz/1ace8c7c0ccf01db70ceee8f767bb6f9#file-different-eventloop-py-L188.

The error seems to always happen (at least the error about using the wrong loop) inside the clean-up code from select(), it might help adding some logging there, like printing a stack trace when the select() was created and when it is being cleaned-up to see if both actions are done in different tests (and different loops).

The text was updated successfully, but these errors were encountered:

llucax · 2024-10-01T07:16:50Z

There seems to be a --setup-show flag that might help checking if we don't somehow have multiple loops overlapping.

Reading a bit more about pytest-asyncio, it seems there have been a few big changes in how this library is supposed to work between 0.21, 0.23 and 0.24. Probably all our code was written using the API 0.21 or older, so maybe issues might be connected to the upgrade to 0.24, although that was done one month ago so I'm not sure it's that likely. But maybe this issue is just surfacing some misconfiguration or misused of 0.24.

Here are official migration guides, it might be worth checking them and making sure we are using pytest-asyncio properly:

llucax · 2024-10-01T07:43:02Z

Duplicate of #61.

llucax added priority:❓ We need to figure out how soon this should be addressed type:bug Something isn't working labels Oct 1, 2024

llucax assigned Marenz and llucax Oct 1, 2024

keywordlabeler bot added the part:tests Affects the unit, integration and performance (benchmarks) tests label Oct 1, 2024

llucax mentioned this issue Oct 1, 2024

Dispatch Managing Actor #54

Merged

llucax added priority:high Address this as soon as possible and removed priority:❓ We need to figure out how soon this should be addressed labels Oct 1, 2024

llucax closed this as not planned Won't fix, can't repro, duplicate, stale Oct 1, 2024

llucax added the resolution:duplicate This issue or pull request already exists label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky hanging tests after merging #54 #66

Flaky hanging tests after merging #54 #66

llucax commented Oct 1, 2024

llucax commented Oct 1, 2024

llucax commented Oct 1, 2024 •

edited

Loading

Flaky hanging tests after merging #54 #66

Flaky hanging tests after merging #54 #66

Comments

llucax commented Oct 1, 2024

What happened?

What did you expect instead?

Affected version(s)

Affected part(s)

Extra information

llucax commented Oct 1, 2024

llucax commented Oct 1, 2024 • edited Loading

llucax commented Oct 1, 2024 •

edited

Loading