ARM64 Dekker-pattern StoreLoad fences in java.util.concurrent + VirtualThread#53
Open
macarte wants to merge 1 commit intomacarte/PR4-winarm64from
Open
ARM64 Dekker-pattern StoreLoad fences in java.util.concurrent + VirtualThread#53macarte wants to merge 1 commit intomacarte/PR4-winarm64from
macarte wants to merge 1 commit intomacarte/PR4-winarm64from
Conversation
…alThread
On ARM64, volatile write (STLR/release) + volatile read (LDAR/acquire) to
different addresses does NOT provide StoreLoad ordering. This breaks
Dekker-like protocols where one side writes field A then reads field B,
while the other writes B then reads A — both sides can miss each other's
stores.
This adds U.fullFence() / VarHandle.fullFence() at all identified
Dekker-pattern sites:
VirtualThread.java:
- afterYield() PARKING path: between setState(PARKED/TIMED_PARKED) and
reading parkPermit (Dekker with unpark)
- afterYield() BLOCKING path: between setState(BLOCKED) and reading
blockPermit (Dekker with unblock)
- afterYield() WAITING path: between setState(WAIT/TIMED_WAIT) and
reading notified (Dekker with notify); fences in both untimed and
timed sub-paths (adapted for tip's per-path inline checks)
- afterDone(): between setState(TERMINATED) and reading
notifyAllAfterTerminate (Dekker with beforeJoin)
- unpark(): between getAndSetParkPermit(true) and reading state
(Dekker with afterYield PARKING path)
- unblock(): between blockPermit=true and reading state (Dekker with
afterYield BLOCKING path)
LinkedTransferQueue.java:
- xfer(): between cmpExItem CAS and reading waiter (Dekker with
await() which writes waiter then reads item)
SynchronousQueue.java:
- xferLifo(): between cmpExItem CAS and reading waiter (same Dekker
as LinkedTransferQueue)
AbstractQueuedSynchronizer.java:
- acquire(): between node.status=WAITING and re-reading state in
tryAcquire/tryAcquireShared (Dekker with release/releaseShared)
- release(): between tryRelease state update and reading node.status
in signalNext
- releaseShared(): same as release()
These fences are correctness-critical on ARM64, functionally redundant
on x86 (TSO already provides StoreLoad), and appear only on non-hot
paths (state transitions, not tight loops).
Author
|
results |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On ARM64, volatile write (STLR/release) + volatile read (LDAR/acquire) to different addresses does NOT provide StoreLoad ordering. This breaks Dekker-like protocols where one side writes field A then reads field B, while the other writes B then reads A — both sides can miss each other's stores.
This adds U.fullFence() / VarHandle.fullFence() at all identified Dekker-pattern sites:
VirtualThread.java:
LinkedTransferQueue.java:
SynchronousQueue.java:
AbstractQueuedSynchronizer.java:
These fences are correctness-critical on ARM64, functionally redundant on x86 (TSO already provides StoreLoad), and appear only on non-hot paths (state transitions, not tight loops).