Skip to content

ARM64 Dekker-pattern StoreLoad fences in java.util.concurrent + VirtualThread#53

Open
macarte wants to merge 1 commit intomacarte/PR4-winarm64from
macarte/PR5-winarm64
Open

ARM64 Dekker-pattern StoreLoad fences in java.util.concurrent + VirtualThread#53
macarte wants to merge 1 commit intomacarte/PR4-winarm64from
macarte/PR5-winarm64

Conversation

@macarte
Copy link

@macarte macarte commented Mar 10, 2026

On ARM64, volatile write (STLR/release) + volatile read (LDAR/acquire) to different addresses does NOT provide StoreLoad ordering. This breaks Dekker-like protocols where one side writes field A then reads field B, while the other writes B then reads A — both sides can miss each other's stores.

This adds U.fullFence() / VarHandle.fullFence() at all identified Dekker-pattern sites:

VirtualThread.java:

  • afterYield() PARKING path: between setState(PARKED/TIMED_PARKED) and reading parkPermit (Dekker with unpark)
  • afterYield() BLOCKING path: between setState(BLOCKED) and reading blockPermit (Dekker with unblock)
  • afterYield() WAITING path: between setState(WAIT/TIMED_WAIT) and reading notified (Dekker with notify); fences in both untimed and timed sub-paths (adapted for tip's per-path inline checks)
  • afterDone(): between setState(TERMINATED) and reading notifyAllAfterTerminate (Dekker with beforeJoin)
  • unpark(): between getAndSetParkPermit(true) and reading state (Dekker with afterYield PARKING path)
  • unblock(): between blockPermit=true and reading state (Dekker with afterYield BLOCKING path)

LinkedTransferQueue.java:

  • xfer(): between cmpExItem CAS and reading waiter (Dekker with await() which writes waiter then reads item)

SynchronousQueue.java:

  • xferLifo(): between cmpExItem CAS and reading waiter (same Dekker as LinkedTransferQueue)

AbstractQueuedSynchronizer.java:

  • acquire(): between node.status=WAITING and re-reading state in tryAcquire/tryAcquireShared (Dekker with release/releaseShared)
  • release(): between tryRelease state update and reading node.status in signalNext
  • releaseShared(): same as release()

These fences are correctness-critical on ARM64, functionally redundant on x86 (TSO already provides StoreLoad), and appear only on non-hot paths (state transitions, not tight loops).

…alThread

On ARM64, volatile write (STLR/release) + volatile read (LDAR/acquire) to
different addresses does NOT provide StoreLoad ordering. This breaks
Dekker-like protocols where one side writes field A then reads field B,
while the other writes B then reads A — both sides can miss each other's
stores.

This adds U.fullFence() / VarHandle.fullFence() at all identified
Dekker-pattern sites:

VirtualThread.java:
  - afterYield() PARKING path: between setState(PARKED/TIMED_PARKED) and
    reading parkPermit (Dekker with unpark)
  - afterYield() BLOCKING path: between setState(BLOCKED) and reading
    blockPermit (Dekker with unblock)
  - afterYield() WAITING path: between setState(WAIT/TIMED_WAIT) and
    reading notified (Dekker with notify); fences in both untimed and
    timed sub-paths (adapted for tip's per-path inline checks)
  - afterDone(): between setState(TERMINATED) and reading
    notifyAllAfterTerminate (Dekker with beforeJoin)
  - unpark(): between getAndSetParkPermit(true) and reading state
    (Dekker with afterYield PARKING path)
  - unblock(): between blockPermit=true and reading state (Dekker with
    afterYield BLOCKING path)

LinkedTransferQueue.java:
  - xfer(): between cmpExItem CAS and reading waiter (Dekker with
    await() which writes waiter then reads item)

SynchronousQueue.java:
  - xferLifo(): between cmpExItem CAS and reading waiter (same Dekker
    as LinkedTransferQueue)

AbstractQueuedSynchronizer.java:
  - acquire(): between node.status=WAITING and re-reading state in
    tryAcquire/tryAcquireShared (Dekker with release/releaseShared)
  - release(): between tryRelease state update and reading node.status
    in signalNext
  - releaseShared(): same as release()

These fences are correctness-critical on ARM64, functionally redundant
on x86 (TSO already provides StoreLoad), and appear only on non-hot
paths (state transitions, not tight loops).
@macarte
Copy link
Author

macarte commented Mar 10, 2026

results


  Failures (all cross-platform, NOT related to ARM64 changes):

  ┌───┬──────────────────┬──────────────────────────┬──────────────────────────────────────────────┬────────────────────────┐
  │ # │ Platform         │ Test Group               │ Test                                         │ Error                  │
  ├───┼──────────────────┼──────────────────────────┼──────────────────────────────────────────────┼────────────────────────┤
  │ 1 │ linux-x64        │ jdk/tier1 part 3         │ TestSharedCloseJFR.java                      │ Exit code 1            │
  ├───┼──────────────────┼──────────────────────────┼──────────────────────────────────────────────┼────────────────────────┤
  │ 2 │ linux-x64        │ hs/tier1 compiler part 3 │ TestReturnOopSetForJFRWriteCheckpoint.java   │ IR framework exception │
  ├───┼──────────────────┼──────────────────────────┼──────────────────────────────────────────────┼────────────────────────┤
  │ 3 │ linux-x64-static │ jdk/tier1 part 3         │ (same as #1)                                 │ (same)                 │
  ├───┼──────────────────┼──────────────────────────┼──────────────────────────────────────────────┼────────────────────────┤
  │ 4 │ linux-x64-static │ hs/tier1 compiler part 3 │ (same as #2)                                 │ (same)                 │
  └───┴──────────────────┴──────────────────────────┴──────────────────────────────────────────────┴────────────────────────┘

  These are the exact same pre-existing linux-x64 flaky tests from PR #51. The PR5 Dekker fences fixed all the windows-aarch64
  timeouts — PingPong, Skynet, MultipleProducersSingleConsumerLoops, ProducerConsumerLoops, and the JVMTI GetStackTrace*StressTest
  tests are all now passing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant