Add StoreLoad barriers to ObjectMonitor::try_spin() for ARM64 Dekker …#50
Add StoreLoad barriers to ObjectMonitor::try_spin() for ARM64 Dekker …#50macarte wants to merge 1 commit intomacarte/PR1-winarm64from
Conversation
…protocol The Dekker protocol between try_spin() (ST _succ -> LD _owner) and exit() requires a StoreLoad barrier on both sides. The exit() side already has one (release_clear_owner + OrderAccess::storeload), but the spinner side was missing the corresponding fence. On ARM64, volatile store (STLR) followed by volatile load (LDAR) to different addresses does NOT imply StoreLoad ordering. Without the explicit barrier, the CPU can reorder the _owner load before the _succ store, causing the exiter to miss the successor designation while the spinner misses the lock release — leading to missed wakeups and thread starvation. Insert OrderAccess::storeload() after set_successor(current) in both places in try_spin(): before the spin loop and at the end of each iteration.
|
baseline results |
|
manually running sanity checks: https://github.com/microsoft/openjdk-jdk/actions/runs/22914389984 |
| // the _owner load before the _succ store. On ARM64 with MSVC | ||
| // /volatile:iso, Atomic::store/load are plain STR/LDR with no | ||
| // barrier, so without this fence the Dekker protocol is broken and | ||
| // the exiter may not see our successor designation while we may not |
There was a problem hiding this comment.
Is this fence still needed now that Atomic::store/load are no longer plain STR/LDRs?
There was a problem hiding this comment.
Is this fence still needed now that Atomic::store/load are no longer plain STR/LDRs?
One reason this might not be needed is the explanation in https://github.com/openjdk/jdk/blob/9d4fbbe36d85d71ce850bb83bbfb1ce1d3e8dd23/src/hotspot/share/runtime/objectMonitor.cpp#L1586 - "the try_set_owner_from() below uses cmpxchg() so we get the fence down there." (this would be line 2492 in the right view of this file)
| // Here on the spinner's side, we need a StoreLoad barrier between | ||
| // setting _succ and reading _owner to prevent the CPU from reordering | ||
| // the _owner load before the _succ store. On ARM64 with MSVC | ||
| // /volatile:iso, Atomic::store/load are plain STR/LDR with no |
There was a problem hiding this comment.
This comment should probably be reworded to remove the statement that /volatile:iso is in use if we intend to switch to /volatile:ms
|
manual run results: |
Add StoreLoad barriers to ObjectMonitor::try_spin() for ARM64 Dekker protocol
The Dekker protocol between try_spin() (ST _succ -> LD _owner) and exit() requires a StoreLoad barrier on both sides. The exit() side already has one (release_clear_owner + OrderAccess::storeload), but the spinner side was missing the corresponding fence.
On ARM64, volatile store (STLR) followed by volatile load (LDAR) to different addresses does NOT imply StoreLoad ordering. Without the explicit barrier, the CPU can reorder the _owner load before the _succ store, causing the exiter to miss the successor designation while the spinner misses the lock release — leading to missed wakeups and thread starvation.
Insert OrderAccess::storeload() after set_successor(current) in both places in try_spin(): before the spin loop and at the end of each iteration.