Skip to content

Conversation

@vdbergh
Copy link
Contributor

@vdbergh vdbergh commented Jan 17, 2026

We fix an issue created by #2443.

Assume that X is a low throughput test, currently with 0 cores. Then in current master it moves to the front of the queue. Assume it manages to pick up a large core worker Y. Since X is low throughput it will not pick up more workers. When Y finishes its task on X, X falls back to 0 cores and moves to the front of the queue again. Since Y is now looking for a new task it will probably pick up X again. So X and Y are permanently paired which is not desirable.

In this PR we fix this by replacing run["cores"] by

adjusted_cores = run["cores"] + (max_threads if run_id == last_run_id else 0)

In other works we are re-adding the numbers of cores that were freed by this worker when the previous task on this run finished.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an issue introduced by #2443 where low-throughput runs with few cores were repeatedly paired with the same large core worker. The fix adjusts how worker assignment priorities are calculated by temporarily inflating the core count of the last-worked run to discourage immediate reassignment.

Changes:

  • Replaced the repeat_penalty variable with adjusted_cores that adds the freed cores directly to the run's core count for priority calculations
  • Updated both the "ensure runs get cores" criterion and the "match intended throughput" criterion to use adjusted_cores instead of raw run["cores"]
  • Improved code comments to better explain the motivation and mechanism

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ppigazzini
Copy link
Collaborator

DEV updated, workers joined.

We fix an issue created by official-stockfish#2443.

Assume that X is a low throughput test, currently with 0 cores.
Then in current master it moves to the front of the queue.
Assume it manages to pick up a large core worker Y. Since X is
low throughput it will not pick up more workers. When Y finishes its task
on X, X falls back to 0 cores and moves to the front
of the queue again. Since Y is now looking for a new task it will
probably pick up X again. So X and Y are permanently paired which is
not desirable.

In this PR we fix this by replacing run["cores"] by

adjusted_cores = run["cores"] + (max_threads if run_id == last_run_id else 0)

In other works we are re-adding the numbers of cores that were freed by this
worker when the previous task on this run finished.
@ppigazzini
Copy link
Collaborator

I stopped my x86-64-avx512icl worker to leave that arch available to you.

@vdbergh
Copy link
Contributor Author

vdbergh commented Jan 17, 2026

Last push corrects a typo in the commit message.

@vdbergh
Copy link
Contributor Author

vdbergh commented Jan 17, 2026

It seems to working. I lowered the avx512icl test to low throughput and it is still getting cores but not immediately.
I am away for the day now so I shutting down my laptop with my 3 workers.

@ppigazzini
Copy link
Collaborator

Ok, enjoy your day!

@ppigazzini
Copy link
Collaborator

ppigazzini commented Jan 17, 2026

The "x86-64-sse41-popcnt" test is not trapping the "x86-64-sse41-popcnt" worker.

@vdbergh
Copy link
Contributor Author

vdbergh commented Jan 17, 2026

The "x86-64-sse41-popcnt" test is not trapping the "x86-64-sse41-popcnt" worker.

It will not trap it if it can get the required throughput without it. Also a low throughput test may sometimes hijack a worker since a zero core test moves to the front of the queue.

I am thinking about a decaying average implementation which could fix this but it will take a bit of time.

Copy link
Collaborator

@ppigazzini ppigazzini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on DEV, PROD update after merging, thank you @vdbergh

@ppigazzini ppigazzini merged commit ae9c632 into official-stockfish:master Jan 17, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement server server side changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants