Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement separate long build queues #13

Merged
merged 1 commit into from
May 9, 2024
Merged

Conversation

carlocab
Copy link
Member

This implements a separate queue for long-running builds that are
identified by a suffix -long in the runner name.

The values for slots in src/queue_types.rb is probably wrong, and
will need adjusting.

This should achieve having an entirely separate queue for long build
jobs. What isn't clear to me is whether this leads to unused capacity
when there are no long build jobs to be done but there are queued short
build jobs waiting.

@carlocab carlocab requested a review from Bo98 April 15, 2024 09:00
@carlocab
Copy link
Member Author

⚠️ Warning ⚠️ completely untested

Comment on lines 10 to 12
MACOS_X86_64_LEGACY_LONG = 3
MACOS_ARM64_LONG = 4
MACOS_X86_64_LONG = 5
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably a more elegant way of encoding the fact that we want a long variant of an existing queue type...

@Bo98
Copy link
Member

Bo98 commented Apr 15, 2024

This is good and is also how I would have done it. One problem here though is we now have 12 slots for short queue and a separate 12 slots for the long queue.

We don't have 24 slots, so what we actually want is 12 slots shared between the two of them and somehow make it dynamic so that long builds get limited to 6 or so but short builds can extend up to the full 12 if free.

My initial thought a while back was a priority queue but that probably doesn't work well if we want to limit long builds to a different quantity.

@carlocab
Copy link
Member Author

We don't have 24 slots, so what we actually want is 12 slots shared between the two of them and somehow make it dynamic so that long builds get limited to 6 or so but short builds can extend up to the full 12 if free.

Yes, this was my thinking too. Just not quite sure how to make it work without invasive changes to existing code. Will give it more of a think.

@Bo98
Copy link
Member

Bo98 commented Apr 16, 2024

One approach could be to replace the start processors' Queue.new with something else, such as a double-headed linked list (with pointers to next and next short) with popping that's dynamic based on resource usage (tracked by a semaphore or whatever). That would also allow it to remain one queue in terms of processing.

That may not necessarily be the best structure. concurrent-ruby may or may not have some inspiration (LockFreeLinkedSet etc).

In the end, every queue will have short and long items so some sort of in-queue handling would make sense here. Ideally we would generalise it well, so that say a -massbottle as an extra tag later on or something can be very easily inserted into the same code.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me so far! Good to merge when @Bo98 is happy.

@carlocab carlocab force-pushed the separate-long-build-queues branch from 2c9c147 to ea7b4f7 Compare May 2, 2024 12:58
@carlocab
Copy link
Member Author

carlocab commented May 2, 2024

This needs a bit of cleaning up, but I think it's in a state where we're able to discuss the approach.

@carlocab carlocab requested a review from MikeMcQuaid May 2, 2024 12:59
@carlocab carlocab force-pushed the separate-long-build-queues branch 2 times, most recently from 46bd7a4 to 8de88c3 Compare May 2, 2024 13:01
src/job.rb Outdated Show resolved Hide resolved
src/job_queue.rb Outdated Show resolved Hide resolved
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Only comments I would have are those already present. Thanks @carlocab!

src/job_queue.rb Outdated
@mutex.synchronize do
running_long_build_count = @state.running_jobs(@queue_type).count(&:long_build?)

if (running_long_build_count < 2 || @queue[:default].empty?) && !@queue[:long].empty?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check may not necessarily be what we want (because it allows us to queue many long build jobs if the default queue is empty), but I haven't worked out how to avoid the situation where we try to call @queue[:default].pop on an empty @queue[:default].

Copy link
Member

@Bo98 Bo98 May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is an issue particularly since the mutex will not be released when waiting on pop.

A couple solutions may be one or both of:

  • Blocking a condition variable when both are empty that is reset on <<.
  • Not using Queue at all and instead use an array (given we won't be using the blocking feature anymore).

@carlocab carlocab marked this pull request as ready for review May 2, 2024 13:38
@carlocab carlocab force-pushed the separate-long-build-queues branch from bda2ca8 to dd244b7 Compare May 2, 2024 13:40
carlocab added a commit to Homebrew/brew that referenced this pull request May 3, 2024
This implements a separate queue for long-running builds that are
identified by a tag `-long` in the runner name.
@Bo98 Bo98 force-pushed the separate-long-build-queues branch from f0fe505 to 513797a Compare May 9, 2024 13:35
@Bo98
Copy link
Member

Bo98 commented May 9, 2024

Looks good, thanks!

Only issue I found was SharedState.instance in the JobQueue constructor caused a recursive loop because the SharedState constructor itself created the job queue.

https://github.com/Homebrew/ci-orchestrator/compare/f0fe50550a134351612979b45b790cbc180950b0..513797afe30e243e213fcb8c0425dbde0c7d3d9e

Will deploy shortly.

@Bo98 Bo98 merged commit 513797a into main May 9, 2024
4 checks passed
@Bo98 Bo98 deleted the separate-long-build-queues branch May 9, 2024 13:44
carlocab added a commit to Homebrew/homebrew-core that referenced this pull request May 10, 2024
This shouldn't be needed anymore after Homebrew/ci-orchestrator#13.
loop do
running_long_build_count = SharedState.instance.running_jobs(@queue_type).count(&:long_build?)

if running_long_build_count < 2 && !@queue[:long].empty?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the limit in homebrew-core is 2 PRs, with a PR containing 3 OS versions, don't we actually want this to be 6 instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? Is that the count we want per queue_type? If so, then yes, I think that's correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these are our queue types:

MACOS_X86_64_LEGACY = 0
MACOS_ARM64 = 1
MACOS_X86_64 = 2

So, yes, 6 seems correct. Will open a new PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. All arm64 is all one queue type so 6 sounds correct to me.

x86_64 is technically two types but that's not going to be the case come macOS 15.

carlocab added a commit that referenced this pull request May 10, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants