Add StoreLoad barriers to ObjectMonitor::try_spin() for ARM64 Dekker … by macarte · Pull Request #50 · microsoft/openjdk-jdk

macarte · 2026-03-09T15:50:03Z

Add StoreLoad barriers to ObjectMonitor::try_spin() for ARM64 Dekker protocol

The Dekker protocol between try_spin() (ST _succ -> LD _owner) and exit() requires a StoreLoad barrier on both sides. The exit() side already has one (release_clear_owner + OrderAccess::storeload), but the spinner side was missing the corresponding fence.

On ARM64, volatile store (STLR) followed by volatile load (LDAR) to different addresses does NOT imply StoreLoad ordering. Without the explicit barrier, the CPU can reorder the _owner load before the _succ store, causing the exiter to miss the successor designation while the spinner misses the lock release — leading to missed wakeups and thread starvation.

Insert OrderAccess::storeload() after set_successor(current) in both places in try_spin(): before the spin loop and at the end of each iteration.

…protocol The Dekker protocol between try_spin() (ST _succ -> LD _owner) and exit() requires a StoreLoad barrier on both sides. The exit() side already has one (release_clear_owner + OrderAccess::storeload), but the spinner side was missing the corresponding fence. On ARM64, volatile store (STLR) followed by volatile load (LDAR) to different addresses does NOT imply StoreLoad ordering. Without the explicit barrier, the CPU can reorder the _owner load before the _succ store, causing the exiter to miss the successor designation while the spinner misses the lock release — leading to missed wakeups and thread starvation. Insert OrderAccess::storeload() after set_successor(current) in both places in try_spin(): before the spin loop and at the end of each iteration.

macarte · 2026-03-09T20:11:25Z

baseline results

Here are all 4 errored tests across the 2 failing windows-aarch64 groups on PR #50:

  ┌──────────────────────┬───────────────────────────────────────────────────────────────────────────────┬─────────────────────┐
  │ Group                │ Test                                                                          │ Error               │
  ├──────────────────────┼───────────────────────────────────────────────────────────────────────────────┼─────────────────────┤
  │ jdk/tier1 part 1     │ java/lang/Thread/virtual/stress/PingPong.java#ltq                             │ Timed out 480s      │
  ├──────────────────────┼───────────────────────────────────────────────────────────────────────────────┼─────────────────────┤
  │ jdk/tier1 part 1     │ java/lang/Thread/virtual/stress/PingPong.java#sq                              │ Timed out 480s      │
  ├──────────────────────┼───────────────────────────────────────────────────────────────────────────────┼─────────────────────┤
  │ jdk/tier1 part 1     │ java/lang/Thread/virtual/stress/Skynet.java#default                           │ JVM timed out 6400s │
  ├──────────────────────┼───────────────────────────────────────────────────────────────────────────────┼─────────────────────┤
  │ jdk/tier1 part 2     │ java/util/concurrent/BlockingQueue/SingleProducerMultipleConsumerLoops.java   │ Timed out 480s      │
  └──────────────────────┴───────────────────────────────────────────────────────────────────────────────┴─────────────────────┘

  Same 4 tests as PR #48. The hs/tier1 serviceability JVMTI stress test (GetStackTraceNotSuspendedStressTest) that failed on PR #48
  passed this time — so that one was likely flaky. The core failures remain the virtual thread and j.u.c concurrency tests that need
  the Dekker fences from PRs 3 and 5.

macarte · 2026-03-10T17:04:34Z

manually running sanity checks: https://github.com/microsoft/openjdk-jdk/actions/runs/22914389984

swesonga · 2026-03-10T17:40:27Z

src/hotspot/share/runtime/objectMonitor.cpp

+  // the _owner load before the _succ store. On ARM64 with MSVC
+  // /volatile:iso, Atomic::store/load are plain STR/LDR with no
+  // barrier, so without this fence the Dekker protocol is broken and
+  // the exiter may not see our successor designation while we may not


Is this fence still needed now that Atomic::store/load are no longer plain STR/LDRs?

Is this fence still needed now that Atomic::store/load are no longer plain STR/LDRs?

One reason this might not be needed is the explanation in https://github.com/openjdk/jdk/blob/9d4fbbe36d85d71ce850bb83bbfb1ce1d3e8dd23/src/hotspot/share/runtime/objectMonitor.cpp#L1586 - "the try_set_owner_from() below uses cmpxchg() so we get the fence down there." (this would be line 2492 in the right view of this file)

swesonga · 2026-03-10T17:44:02Z

src/hotspot/share/runtime/objectMonitor.cpp

+  // Here on the spinner's side, we need a StoreLoad barrier between
+  // setting _succ and reading _owner to prevent the CPU from reordering
+  // the _owner load before the _succ store. On ARM64 with MSVC
+  // /volatile:iso, Atomic::store/load are plain STR/LDR with no


This comment should probably be reworded to remove the statement that /volatile:iso is in use if we intend to switch to /volatile:ms

macarte · 2026-03-10T21:37:25Z

manual run results:

Overall: 84 success, 1 failure out of 85 jobs

  The only failing job is windows-aarch64 / test (jdk/tier1 part 1) with 3 errors (1106 pass, 0 fail, 3 error, 38 skip):

  ┌───┬───────────────────────┬───────────────────┐
  │ # │ Test                  │ Error             │
  ├───┼───────────────────────┼───────────────────┤
  │ 1 │ PingPong.java#ltq     │ Timed out (480s)  │
  ├───┼───────────────────────┼───────────────────┤
  │ 2 │ PingPong.java#sq      │ Timed out (480s)  │
  ├───┼───────────────────────┼───────────────────┤
  │ 3 │ Skynet.java#default   │ Timed out (6400s) │
  └───┴───────────────────────┴───────────────────┘

  This is the same set of 3 errors from PR1 — the try_spin() StoreLoad barriers in PR2 didn't resolve these virtual thread stress
  test timeouts. Note that jdk/tier1 part 2 passed this time (no SingleProducerMultipleConsumerLoops failure), and hs/tier1
  serviceability also passed. The PingPong/Skynet timeouts were ultimately fixed by PR5's Dekker fences.

swesonga reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add StoreLoad barriers to ObjectMonitor::try_spin() for ARM64 Dekker …#50

Add StoreLoad barriers to ObjectMonitor::try_spin() for ARM64 Dekker …#50
macarte wants to merge 1 commit intomacarte/PR1-winarm64from
macarte/PR2-winarm64

macarte commented Mar 9, 2026

Uh oh!

macarte commented Mar 9, 2026

Uh oh!

macarte commented Mar 10, 2026

Uh oh!

swesonga Mar 10, 2026

Uh oh!

swesonga Mar 10, 2026

Uh oh!

swesonga Mar 10, 2026

Uh oh!

macarte commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

macarte commented Mar 9, 2026

Uh oh!

macarte commented Mar 9, 2026

Uh oh!

macarte commented Mar 10, 2026

Uh oh!

swesonga Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

swesonga Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

swesonga Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

macarte commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants