-
Notifications
You must be signed in to change notification settings - Fork 141
Tweak run assignment for low throughput runs. #2446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ae25207 to
dd4f4ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes an issue introduced by #2443 where low-throughput runs with few cores were repeatedly paired with the same large core worker. The fix adjusts how worker assignment priorities are calculated by temporarily inflating the core count of the last-worked run to discourage immediate reassignment.
Changes:
- Replaced the
repeat_penaltyvariable withadjusted_coresthat adds the freed cores directly to the run's core count for priority calculations - Updated both the "ensure runs get cores" criterion and the "match intended throughput" criterion to use
adjusted_coresinstead of rawrun["cores"] - Improved code comments to better explain the motivation and mechanism
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
DEV updated, workers joined. |
dd4f4ca to
0b5692a
Compare
We fix an issue created by official-stockfish#2443. Assume that X is a low throughput test, currently with 0 cores. Then in current master it moves to the front of the queue. Assume it manages to pick up a large core worker Y. Since X is low throughput it will not pick up more workers. When Y finishes its task on X, X falls back to 0 cores and moves to the front of the queue again. Since Y is now looking for a new task it will probably pick up X again. So X and Y are permanently paired which is not desirable. In this PR we fix this by replacing run["cores"] by adjusted_cores = run["cores"] + (max_threads if run_id == last_run_id else 0) In other works we are re-adding the numbers of cores that were freed by this worker when the previous task on this run finished.
0b5692a to
bf7bb00
Compare
|
I stopped my x86-64-avx512icl worker to leave that arch available to you. |
|
Last push corrects a typo in the commit message. |
|
It seems to working. I lowered the avx512icl test to low throughput and it is still getting cores but not immediately. |
|
Ok, enjoy your day! |
|
The "x86-64-sse41-popcnt" test is not trapping the "x86-64-sse41-popcnt" worker. |
It will not trap it if it can get the required throughput without it. Also a low throughput test may sometimes hijack a worker since a zero core test moves to the front of the queue. I am thinking about a decaying average implementation which could fix this but it will take a bit of time. |
ppigazzini
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good on DEV, PROD update after merging, thank you @vdbergh
We fix an issue created by #2443.
Assume that X is a low throughput test, currently with 0 cores. Then in current master it moves to the front of the queue. Assume it manages to pick up a large core worker Y. Since X is low throughput it will not pick up more workers. When Y finishes its task on X, X falls back to 0 cores and moves to the front of the queue again. Since Y is now looking for a new task it will probably pick up X again. So X and Y are permanently paired which is not desirable.
In this PR we fix this by replacing
run["cores"]byIn other works we are re-adding the numbers of cores that were freed by this worker when the previous task on this run finished.