Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support group/scope scheduling in worksteal #890

Open
hmellor opened this issue Mar 14, 2023 · 11 comments
Open

Support group/scope scheduling in worksteal #890

hmellor opened this issue Mar 14, 2023 · 11 comments

Comments

@hmellor
Copy link

hmellor commented Mar 14, 2023

In the documentation it says that if a test is not in an xdist_group, that it defaults to the load behaviour.

Would it be possible to choose the default behaviour so it could be something like worksteal?

@amezin
Copy link
Collaborator

amezin commented Mar 17, 2023

In the documentation it says that if a test is not in an xdist_group, that it defaults to the load behaviour.

I think the documentation isn't accurate, xdist (as far as I understand the code) doesn't switch schedulers, it just happens that loadgroup with every test being its own small "group" is very similar to load (but not exactly the same - LoadScheduling and LoadScopeScheduling do not share any code).

Would it be possible to choose the default behaviour so it could be something like worksteal?

Only if someone implemented the concept of "groups" in worksteal scheduler. I do not need this feature myself so unlikely to work on it.

BTW, why do you use loadgroup instead of simple load?

@hmellor
Copy link
Author

hmellor commented Mar 19, 2023

We have a test suite where some tests are regular Python tests and others test a Python API with a locally hosted server. So we want everything parallelised but the API tests to run sequentially, which they will if they are on the same worker.

@sshishov
Copy link

In our case we have huge dataset created for module (like fixtures). Therefore it is much more fast to use loadgroup as the tests from the same class/module running on the same worker and initial setup need to be done only once.

Very often we see that some workers finish the job earlier and another worker is hard working to finish it's chunk... it would be great to have a stealing approach when the worker is idle in such case...

@amezin amezin changed the title Could --dist loadgroup default to worksteal instead of load? Support group/scope scheduling in worksteal Mar 23, 2023
@amezin
Copy link
Collaborator

amezin commented Apr 8, 2023

@sshishov Is current worksteal not good enough in your case? Because it already schedules tests in large contiguous chunks (at least initially), so tests from same module should get sent to the same worker.

@agazeley
Copy link

agazeley commented May 9, 2023

We have a similar case to @hmellor and @sshishov where we have expensive setup that we wanted shared across a set of tests within a group. However, once setup is complete these tests run very quickly. This results in workers being loaded up with a lot of tests from the loadgroup scheduling, but often then are idle towards the end as other long running tests (that are not within the same group) occupy the other workers.

We end up with scheduling that looks like:

gw0 - test_one@somegroup... (8 tests, 20minutes runtime)
gw1 - test_two@someothergroup... (2 tests, 15minutes runtime)
gw2 - test_three, test_four, test_five (20m runtime)
gw3 - test_really_long, test_six, test_seven,(45m runtime)

Another option I had was to make --maxschedchunk option work with the loadgroup.

@sshishov
Copy link

Hi @amezin , I did not know that worksteal is scheduling the tests from same module to the same worker... From the documentation I can infer that it is working the same as load, meaning completely random order per test. Am I missing something, or the info provided by you is just omitted from the documentation?

@amezin
Copy link
Collaborator

amezin commented May 27, 2023

worksteal doesn't do anything specific to schedule tests from the same module to the same worker. However, it should be a lot less "randomized" than load. Initially, worksteal takes first n_tests/n_workers tests, and sends them to the 1st worker. Then 2nd group of the same size to the 2nd worker, and so on. So unless you reorder tests intentionally or a lot of rebalancing is required (which worksteal tries to avoid - it starts moving tests between workers only when some worker completely runs out of work), tests from the same module will likely be ran by the same worker. At least, most of them.

@sshishov
Copy link

@amezin the question is, if we are using pytest-randomly which randomize the seed as well as the order of tests, will it affect the scheduling?

@amezin
Copy link
Collaborator

amezin commented May 27, 2023

Unfortunately, yes. That's what I meant by "unless you reorder tests intentionally"...

Although, if I'm not mistaken, there was a random reorder plugin that was able to reorder tests inside of scopes, without breaking the scopes themselves. I'm not sure whether it was pytest-randomly or some other plugin.

@sshishov
Copy link

We tested it out and found out that if we are using loadgroup scheduling and pytest-randomly it will work as expected, meaning the tests will be scheduled from the same module to the same worker... just the tests will be scheduled on random order (if it make sense)... What we wanted - to add the "stealing" functionality to this scheduling, that if the worker finished its work, it could "steal" the work from another worker...

But to be honest, I should look more deeper inside what how everything is working inside pytest-randomly

@amezin
Copy link
Collaborator

amezin commented Oct 31, 2024

First step done: #1144

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants