Skip to content

Add StoreLoad barriers to ObjectMonitor::try_spin() for ARM64 Dekker …#50

Open
macarte wants to merge 1 commit intomacarte/PR1-winarm64from
macarte/PR2-winarm64
Open

Add StoreLoad barriers to ObjectMonitor::try_spin() for ARM64 Dekker …#50
macarte wants to merge 1 commit intomacarte/PR1-winarm64from
macarte/PR2-winarm64

Conversation

@macarte
Copy link

@macarte macarte commented Mar 9, 2026

Add StoreLoad barriers to ObjectMonitor::try_spin() for ARM64 Dekker protocol

The Dekker protocol between try_spin() (ST _succ -> LD _owner) and exit() requires a StoreLoad barrier on both sides. The exit() side already has one (release_clear_owner + OrderAccess::storeload), but the spinner side was missing the corresponding fence.

On ARM64, volatile store (STLR) followed by volatile load (LDAR) to different addresses does NOT imply StoreLoad ordering. Without the explicit barrier, the CPU can reorder the _owner load before the _succ store, causing the exiter to miss the successor designation while the spinner misses the lock release — leading to missed wakeups and thread starvation.

Insert OrderAccess::storeload() after set_successor(current) in both places in try_spin(): before the spin loop and at the end of each iteration.

…protocol

The Dekker protocol between try_spin() (ST _succ -> LD _owner) and exit()
requires a StoreLoad barrier on both sides. The exit() side already has one
(release_clear_owner + OrderAccess::storeload), but the spinner side was
missing the corresponding fence.

On ARM64, volatile store (STLR) followed by volatile load (LDAR) to
different addresses does NOT imply StoreLoad ordering. Without the explicit
barrier, the CPU can reorder the _owner load before the _succ store,
causing the exiter to miss the successor designation while the spinner
misses the lock release — leading to missed wakeups and thread starvation.

Insert OrderAccess::storeload() after set_successor(current) in both
places in try_spin(): before the spin loop and at the end of each
iteration.
@macarte
Copy link
Author

macarte commented Mar 9, 2026

baseline results

Here are all 4 errored tests across the 2 failing windows-aarch64 groups on PR #50:

  ┌──────────────────────┬───────────────────────────────────────────────────────────────────────────────┬─────────────────────┐
  │ Group                │ Test                                                                          │ Error               │
  ├──────────────────────┼───────────────────────────────────────────────────────────────────────────────┼─────────────────────┤
  │ jdk/tier1 part 1     │ java/lang/Thread/virtual/stress/PingPong.java#ltq                             │ Timed out 480s      │
  ├──────────────────────┼───────────────────────────────────────────────────────────────────────────────┼─────────────────────┤
  │ jdk/tier1 part 1     │ java/lang/Thread/virtual/stress/PingPong.java#sq                              │ Timed out 480s      │
  ├──────────────────────┼───────────────────────────────────────────────────────────────────────────────┼─────────────────────┤
  │ jdk/tier1 part 1     │ java/lang/Thread/virtual/stress/Skynet.java#default                           │ JVM timed out 6400s │
  ├──────────────────────┼───────────────────────────────────────────────────────────────────────────────┼─────────────────────┤
  │ jdk/tier1 part 2     │ java/util/concurrent/BlockingQueue/SingleProducerMultipleConsumerLoops.java   │ Timed out 480s      │
  └──────────────────────┴───────────────────────────────────────────────────────────────────────────────┴─────────────────────┘

  Same 4 tests as PR #48. The hs/tier1 serviceability JVMTI stress test (GetStackTraceNotSuspendedStressTest) that failed on PR #48
  passed this time — so that one was likely flaky. The core failures remain the virtual thread and j.u.c concurrency tests that need
  the Dekker fences from PRs 3 and 5.

@macarte
Copy link
Author

macarte commented Mar 10, 2026

manually running sanity checks: https://github.com/microsoft/openjdk-jdk/actions/runs/22914389984

// the _owner load before the _succ store. On ARM64 with MSVC
// /volatile:iso, Atomic::store/load are plain STR/LDR with no
// barrier, so without this fence the Dekker protocol is broken and
// the exiter may not see our successor designation while we may not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fence still needed now that Atomic::store/load are no longer plain STR/LDRs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fence still needed now that Atomic::store/load are no longer plain STR/LDRs?

One reason this might not be needed is the explanation in https://github.com/openjdk/jdk/blob/9d4fbbe36d85d71ce850bb83bbfb1ce1d3e8dd23/src/hotspot/share/runtime/objectMonitor.cpp#L1586 - "the try_set_owner_from() below uses cmpxchg() so we get the fence down there." (this would be line 2492 in the right view of this file)

// Here on the spinner's side, we need a StoreLoad barrier between
// setting _succ and reading _owner to prevent the CPU from reordering
// the _owner load before the _succ store. On ARM64 with MSVC
// /volatile:iso, Atomic::store/load are plain STR/LDR with no
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should probably be reworded to remove the statement that /volatile:iso is in use if we intend to switch to /volatile:ms

@macarte
Copy link
Author

macarte commented Mar 10, 2026

manual run results:

Overall: 84 success, 1 failure out of 85 jobs

  The only failing job is windows-aarch64 / test (jdk/tier1 part 1) with 3 errors (1106 pass, 0 fail, 3 error, 38 skip):

  ┌───┬───────────────────────┬───────────────────┐
  │ # │ Test                  │ Error             │
  ├───┼───────────────────────┼───────────────────┤
  │ 1 │ PingPong.java#ltq     │ Timed out (480s)  │
  ├───┼───────────────────────┼───────────────────┤
  │ 2 │ PingPong.java#sq      │ Timed out (480s)  │
  ├───┼───────────────────────┼───────────────────┤
  │ 3 │ Skynet.java#default   │ Timed out (6400s) │
  └───┴───────────────────────┴───────────────────┘

  This is the same set of 3 errors from PR1 — the try_spin() StoreLoad barriers in PR2 didn't resolve these virtual thread stress
  test timeouts. Note that jdk/tier1 part 2 passed this time (no SingleProducerMultipleConsumerLoops failure), and hs/tier1
  serviceability also passed. The PingPong/Skynet timeouts were ultimately fixed by PR5's Dekker fences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants