-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Don't block within futures in MerkleTreeComputer
#27345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@carmenchui This probably should be cherry-picked into the last rolling release together with an upcoming fix for #27332. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR prevents deadlocks by avoiding blocking inside futures in MerkleTreeComputer, ensuring only the outer entry points block. It also adds a concurrency-focused golden test to validate multiple simultaneous Merkle tree builds.
- Refactor MerkleTreeComputer to return CompletableFuture from internal build path and precompute subtrees upfront.
- Adjust thread pool parallelism in tests and add a concurrent golden test to exercise potential deadlocks.
- Update BUILD to include TestType dependency used for test-detection-based parallelism adjustments.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/test/java/com/google/devtools/build/lib/remote/RemoteExecutionServiceTest.java | Adds a concurrent Merkle tree build test using virtual threads to reproduce deadlocks and validate correctness. |
| src/main/java/com/google/devtools/build/lib/remote/merkletree/MerkleTreeComputer.java | Refactors build flow to be non-blocking internally, precomputes subtrees, and adjusts executor behavior and case handling. |
| src/main/java/com/google/devtools/build/lib/remote/merkletree/BUILD | Adds dependency on TestType for test-detection logic in executor sizing. |
Comments suppressed due to low confidence (1)
src/main/java/com/google/devtools/build/lib/remote/merkletree/MerkleTreeComputer.java:1
- Restricting the symlink case to Artifact.SpecialArtifact excludes regular Artifact symlinks. This causes non-special symlink inputs to fall through and be processed incorrectly. Replace the case type with Artifact to cover all symlink inputs.
// Copyright 2025 The Bazel Authors. All rights reserved.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/test/java/com/google/devtools/build/lib/remote/RemoteExecutionServiceTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize this when reviewing your original PR, but use of CompletableFuture is strongly discouraged at Google, with Guava's ListenableFuture as the recommended replacement.
(There's an internal-only doc that I unfortunately cannot share with a litany of reasons, the most concerning being that CompletableFuture does not propagate cancellation correctly.)
Given that we have to restructure this code anyway, how difficult would it be to switch to ListenableFuture as well?
|
The Caffeine async cache API uses CompletableFuture, which is why I started to use it here. Can I adapt it to ListenableFuture or should I write my own async cache logic based on a synchronous cache with ListenableFuture values? |
I'm not sure. I think adapting would be preferable (less tricky code to write/maintain), but it seems that the only way Guava lets you do this is |
|
@tjgq I rewrote this with |
Only the outer
buildForFiles/buildForSpawnentry points may block on futures now. This fixes deadlocks that could happen when all threads inMERKLE_TREE_BUILD_POOLblock on futures scheduled on this pool.The existing golden test now builds the same Merkle tree multiple times concurrently, which reproduces the deadlock before this change on virtually every run and passes in 100 attempts with the fix.
Fixes #27331