Describe the bug
In the MULTITHREADED (skip-merge) shuffle path, a ManagedBuffer returned by
MultithreadedShuffleBufferCatalog.getMergedBuffer / getMergedBatchBuffer does not hold a reference
on its backing partial-file handle until its first consumer action (retain(),
createInputStream(), convertToNetty(), or nioByteBuffer()). If unregisterShuffle for that
shuffle runs in the window between the buffer being handed out and that first action, the backing
handle is closed and the first lease acquisition (or read) fails.
This is a narrow residual of the retained-buffer lifetime fix (follow-up to #15018, PR #15064 ), which
protects retained buffers, active streams, and active file regions but intentionally does not cover
the hand-off window before any lease exists.
Impact
Low / benign and self-healing:
- Surfaces as a retriable fetch failure —
FetchFailedException on the local read path,
ChunkFetchFailure on the remote serving path — which Spark handles via stage retry.
- No data corruption (a closed/deleted handle yields an exception, never stale bytes), no
crash, and no resource leak.
- Requires an abnormal ordering: shuffle cleanup racing an in-flight fetch of the same shuffle
(e.g. straggler / speculative / retried reducer, or premature cleanup). Normal operation defers
unregisterShuffle until the shuffle's consumers are done, so the window is rarely hit.
Where
MultithreadedShuffleBufferCatalog.getMergedBuffer / getMergedBatchBuffer (construct the buffer)
MultiBatchManagedBuffer (acquires a lease only on first consumer action)
MultithreadedShuffleBufferCatalog.unregisterShuffle (requests the deferred close)
ShuffleHandleReference / ShuffleHandleLease (the reference-counting lifecycle)
Why it was left out of the main fix
Fully closing this window means the buffer must hold a reference for its own lifetime, from
construction until it is released. That in turn requires a guaranteed releaser for buffers that are
constructed and then abandoned without being consumed or released — otherwise the handle is pinned
forever. A naive "acquire a lease in the constructor" approach was tried and rejected because it
introduced exactly such a permanent handle/file leak on abandonment paths (e.g. when a multi-block
fetch fails partway after some buffers are already built). The main PR therefore keeps the buffer
inert at construction and accepts this graceful, retriable residual.
Proposed fix (options)
- Tolerant lookup (simplest). Make
getMergedBuffer / getMergedBatchBuffer translate a
"handle already closed" failure into the existing "no data found for block" failure, so callers see
one consistent, retriable block-gone outcome instead of an unchecked IllegalStateException.
- Own-lifetime lease with a guaranteed releaser. Have the buffer hold a reference from
construction, paired with a reliable release (e.g. a Cleaner/phantom-reference backstop, or a
sweep in unregisterShuffle / executor shutdown that force-closes handles whose only remaining
references are such non-consumer leases), so an abandoned buffer can never pin a handle.
Option 1 is low-risk and preserves current behavior with a cleaner failure type. Option 2 closes the
window entirely but must guarantee the releaser to avoid reintroducing the leak.
Acceptance criteria
- A
MultiBatchManagedBuffer that is constructed and then never consumed or released does not
pin its backing handle (no file / fd / host-memory leak).
- A block fetched concurrently with
unregisterShuffle for the same shuffle fails gracefully and
retriably (consistent block-gone failure), not with an unchecked exception escaping the read /
encode path.
- Existing skip-merge behavior (retained buffers, streams, and file regions stay readable across
unregisterShuffle; handles close deterministically when the last lease drops) is preserved.
References
Describe the bug
In the MULTITHREADED (skip-merge) shuffle path, a
ManagedBufferreturned byMultithreadedShuffleBufferCatalog.getMergedBuffer/getMergedBatchBufferdoes not hold a referenceon its backing partial-file handle until its first consumer action (
retain(),createInputStream(),convertToNetty(), ornioByteBuffer()). IfunregisterShufflefor thatshuffle runs in the window between the buffer being handed out and that first action, the backing
handle is closed and the first lease acquisition (or read) fails.
This is a narrow residual of the retained-buffer lifetime fix (follow-up to #15018, PR #15064 ), which
protects retained buffers, active streams, and active file regions but intentionally does not cover
the hand-off window before any lease exists.
Impact
Low / benign and self-healing:
FetchFailedExceptionon the local read path,ChunkFetchFailureon the remote serving path — which Spark handles via stage retry.crash, and no resource leak.
(e.g. straggler / speculative / retried reducer, or premature cleanup). Normal operation defers
unregisterShuffleuntil the shuffle's consumers are done, so the window is rarely hit.Where
MultithreadedShuffleBufferCatalog.getMergedBuffer/getMergedBatchBuffer(construct the buffer)MultiBatchManagedBuffer(acquires a lease only on first consumer action)MultithreadedShuffleBufferCatalog.unregisterShuffle(requests the deferred close)ShuffleHandleReference/ShuffleHandleLease(the reference-counting lifecycle)Why it was left out of the main fix
Fully closing this window means the buffer must hold a reference for its own lifetime, from
construction until it is released. That in turn requires a guaranteed releaser for buffers that are
constructed and then abandoned without being consumed or released — otherwise the handle is pinned
forever. A naive "acquire a lease in the constructor" approach was tried and rejected because it
introduced exactly such a permanent handle/file leak on abandonment paths (e.g. when a multi-block
fetch fails partway after some buffers are already built). The main PR therefore keeps the buffer
inert at construction and accepts this graceful, retriable residual.
Proposed fix (options)
getMergedBuffer/getMergedBatchBuffertranslate a"handle already closed" failure into the existing "no data found for block" failure, so callers see
one consistent, retriable block-gone outcome instead of an unchecked
IllegalStateException.construction, paired with a reliable release (e.g. a
Cleaner/phantom-reference backstop, or asweep in
unregisterShuffle/ executor shutdown that force-closes handles whose only remainingreferences are such non-consumer leases), so an abandoned buffer can never pin a handle.
Option 1 is low-risk and preserves current behavior with a cleaner failure type. Option 2 closes the
window entirely but must guarantee the releaser to avoid reintroducing the leak.
Acceptance criteria
MultiBatchManagedBufferthat is constructed and then never consumed or released does notpin its backing handle (no file / fd / host-memory leak).
unregisterShufflefor the same shuffle fails gracefully andretriably (consistent block-gone failure), not with an unchecked exception escaping the read /
encode path.
unregisterShuffle; handles close deterministically when the last lease drops) is preserved.References