Windows AArch64 MSVC /volatile:iso memory ordering — HotSpot C++ runtime by macarte · Pull Request #48 · microsoft/openjdk-jdk

macarte · 2026-03-06T22:04:08Z

MSVC's /volatile:iso (default on ARM64) makes volatile reads/writes plain LDR/STR with no acquire/release barriers. HotSpot's C++ runtime was written assuming volatile provides acquire/release semantics.

Changes:

flags-cflags.m4: Add /volatile:ms to JVM_CFLAGS for Windows AArch64 to restore acquire/release semantics for volatile accesses.
orderAccess_windows_aarch64.hpp: Replace std::atomic_thread_fence() with __dmb() intrinsics for READ_MEM_BARRIER (dmb ishld), WRITE_MEM_BARRIER and FULL_MEM_BARRIER (dmb ish). The __dmb() intrinsic acts as both a hardware barrier and compiler barrier for volatile/non-atomic accesses, which std::atomic_thread_fence() does not guarantee under /volatile:iso.
atomicAccess_windows_aarch64.hpp: Override PlatformLoad/PlatformStore with __ldar/__stlr intrinsics (defense-in-depth for Atomic::load/ store). Add PlatformOrderedLoad/PlatformOrderedStore specializations using __ldar/__stlr to avoid redundant dmb in load_acquire/ release_store paths, matching the Linux AArch64 approach.

Rationale: This is the foundational fix. MSVC's /volatile:iso (default on ARM64) means volatile reads/writes are plain LDR/STR with no acquire/release semantics, breaking HotSpot's assumption of volatile ≈ acquire/release throughout the C++ runtime. The /volatile:ms flag restores those semantics for all JVM code. The explicit LDAR/STLR intrinsics in atomic_windows_aarch64.hpp produce identical codegen to what /volatile:ms already generates for plain volatile dereferences, but are retained as defense-in-depth — they guarantee correct acquire/release semantics for Atomic::load()/Atomic::store() regardless of the compiler flag setting. The PlatformOrderedLoad/PlatformOrderedStore overrides additionally avoid a redundant dmb that the generic fallback would emit before the LDAR/STLR. The __dmb() barriers in orderAccess replace the previous std::atomic_thread_fence calls with the ARM-specific intrinsics.

Tests unblocked: Broad set of intermittent failures across ObjectMonitor, ParkEvent, lock-free algorithms in the HotSpot runtime.

MSVC's /volatile:iso (default on ARM64) makes volatile reads/writes plain LDR/STR with no acquire/release barriers. HotSpot's C++ runtime was written assuming volatile provides acquire/release semantics. Changes: 1. flags-cflags.m4: Add /volatile:ms to JVM_CFLAGS for Windows AArch64 to restore acquire/release semantics for volatile accesses. 2. orderAccess_windows_aarch64.hpp: Replace std::atomic_thread_fence() with __dmb() intrinsics for READ_MEM_BARRIER (dmb ishld), WRITE_MEM_BARRIER and FULL_MEM_BARRIER (dmb ish). The __dmb() intrinsic acts as both a hardware barrier and compiler barrier for volatile/non-atomic accesses, which std::atomic_thread_fence() does not guarantee under /volatile:iso. 3. atomicAccess_windows_aarch64.hpp: Override PlatformLoad/PlatformStore with __ldar/__stlr intrinsics (defense-in-depth for Atomic::load/ store). Add PlatformOrderedLoad/PlatformOrderedStore specializations using __ldar/__stlr to avoid redundant dmb in load_acquire/ release_store paths, matching the Linux AArch64 approach.

swesonga · 2026-03-09T16:40:58Z

make/autoconf/flags-cflags.m4

+      # and GCC/Clang AArch64). Use /volatile:ms to restore those semantics and
+      # prevent memory ordering bugs in ObjectMonitor, ParkEvent, and other
+      # lock-free algorithms that use plain volatile fields.
+      $1_CFLAGS_CPU_JVM="-volatile:ms"


Is this change necessary given the atomics changes in atomicAccess_windows_aarch64.hpp?

Is this change necessary given the atomics changes in atomicAccess_windows_aarch64.hpp?

I ask because https://learn.microsoft.com/en-us/cpp/cpp/volatile-cpp?view=msvc-170 makes it sound like the compiler expects /volatile:iso to be used along with explicit synchronization primitives where the code needs it.

So while the atomics changes address many issues, there's still too many weak memory issues in the JVM cpp source itself that don't use atomics. So setting the flag addresses all compiled JVM code issues

To be clearer, the issues in the JVM are only a problem when those code paths are in virtual thread execution or support virtual threads

macarte · 2026-03-09T17:05:25Z

results

	
	  ┌──────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────┬────────────┐
	  │ Group            │ Test                                                                                            │ Error      │
	  ├──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┼────────────┤
	  │ jdk/tier1 part 1 │ java/lang/Thread/virtual/stress/PingPong.java#ltq                                               │ Timed out  │
	  │                  │                                                                                                 │ 480s       │
	  ├──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┼────────────┤
	  │ jdk/tier1 part 1 │ java/lang/Thread/virtual/stress/PingPong.java#sq                                                │ Timed out  │
	  │                  │                                                                                                 │ 480s       │
	  ├──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┼────────────┤
	  │ jdk/tier1 part 1 │ java/lang/Thread/virtual/stress/Skynet.java#default                                             │ JVM timed  │
	  │                  │                                                                                                 │ out 6400s  │
	  ├──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┼────────────┤
	  │ jdk/tier1 part 2 │ java/util/concurrent/BlockingQueue/SingleProducerMultipleConsumerLoops.java                     │ Timed out  │
	  │                  │                                                                                                 │ 480s       │
	  ├──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────┼────────────┤
	  │ hs/tier1         │ serviceability/jvmti/stress/StackTrace/NotSuspended/GetStackTraceNotSuspendedStressTest.java    │ JVM timed  │
	  │ serviceability   │                                                                                                 │ out 480s   │
	  └──────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────┴────────────┘

macarte · 2026-03-09T17:06:55Z

Note that not listed as an error in the GHA results is an intermittent failure in Starvation.java; this occurs more frequently on machines with > 64 processors

swesonga · 2026-03-09T20:51:51Z

src/hotspot/os_cpu/windows_aarch64/atomicAccess_windows_aarch64.hpp

+// The generic PlatformLoad and PlatformStore use plain volatile dereferences.
+// With /volatile:ms (set in flags-cflags.m4 for AArch64), MSVC already compiles
+// those to LDAR/STLR, so these overrides produce identical codegen. They are
+// retained as defense-in-depth: they guarantee acquire/release semantics for


@mo-beck, was the use of -volatile:iso (by not specifying this flag) when porting to windows-aarch64 an intentional choice or just the natural outcome of not specifying this flag? I'm curious because I would think that in the long term, it would be better to have these in Platform overloads in place while keeping -volatile:iso to actually catch other locations where synchronization is required but wasn't implemented. Thoughts?

swesonga · 2026-03-09T22:10:54Z

src/hotspot/os_cpu/windows_aarch64/orderAccess_windows_aarch64.hpp

-#define FULL_MEM_BARRIER atomic_thread_fence(std::memory_order_seq_cst);
+#define READ_MEM_BARRIER  __dmb(_ARM64_BARRIER_ISHLD)
+#define WRITE_MEM_BARRIER __dmb(_ARM64_BARRIER_ISH)
+#define FULL_MEM_BARRIER  __dmb(_ARM64_BARRIER_ISH)


Shouldn't this be the _ARM64_BARRIER_SY barrier type?

Shouldn't this be the _ARM64_BARRIER_SY barrier type?

I'm specifically referring to the FULL_MEM_BARRIER based on explanation at https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#BarrierRestrictions

I'll check on this more (thanks for bringing up) ... an initial analysis by CoPilot:

The ISH (Inner Shareable) domain is correct and matches the rest of HotSpot's AArch64 backend. Here's the reasoning: - ISH covers all CPUs sharing the same OS/hypervisor — i.e., all cores that could run Java threads. This is the standard domain for thread-to-thread synchronization. - SY extends to the full system including external devices (DMA engines, GPUs). It's only needed for device MMIO synchronization, not for inter-thread memory ordering. - Linux AArch64's orderAccess uses __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE) which GCC/Clang compile to dmb ishld/dmb ish — the same ISH domain. - HotSpot's own Membar_mask_bits enum maps StoreLoad and AnyAny to ISH, LoadLoad/LoadStore to ISHLD. The only SY usage in the entire AArch64 backend is isb() (instruction barrier). Using _ARM64_BARRIER_SY would be functionally correct but unnecessarily strong, potentially causing performance degradation on systems with outer-shareable observers.

Saint, following yours and Monica's input I ran test on jdk25: microsoft/openjdk-jdk25u#27

Where I reverted this change and also the AtomicAccess::PlatformLoad, AtomicAccess::PlatformStore pairs (kept the AtomicAccess::PlatformOrderedLoad, AtomicAccess::PlatformOrderedStore pairs)

winarm64 GHA tests are passing; will make the same change here on tip

macarte · 2026-03-10T19:28:52Z

src/hotspot/os_cpu/windows_aarch64/orderAccess_windows_aarch64.hpp

-#define FULL_MEM_BARRIER atomic_thread_fence(std::memory_order_seq_cst);
+#define READ_MEM_BARRIER  __dmb(_ARM64_BARRIER_ISHLD)
+#define WRITE_MEM_BARRIER __dmb(_ARM64_BARRIER_ISH)
+#define FULL_MEM_BARRIER  __dmb(_ARM64_BARRIER_ISH)


revalidate that dmb is needed (its a pause vs sequencing) - this was targetting the same issue that volatile:ms ended up solving

swesonga reviewed Mar 9, 2026

View reviewed changes

macarte commented Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows AArch64 MSVC /volatile:iso memory ordering — HotSpot C++ runtime#48

Windows AArch64 MSVC /volatile:iso memory ordering — HotSpot C++ runtime#48
macarte wants to merge 1 commit intomacarte/baselinePR-TestTrampolineFixfrom
macarte/PR1-winarm64

macarte commented Mar 6, 2026

Uh oh!

swesonga Mar 9, 2026

Uh oh!

swesonga Mar 9, 2026

Uh oh!

macarte Mar 9, 2026

Uh oh!

macarte Mar 9, 2026

Uh oh!

macarte commented Mar 9, 2026

Uh oh!

macarte commented Mar 9, 2026

Uh oh!

swesonga Mar 9, 2026

Uh oh!

swesonga Mar 9, 2026

Uh oh!

swesonga Mar 9, 2026

Uh oh!

macarte Mar 9, 2026

Uh oh!

macarte Mar 11, 2026

Uh oh!

macarte Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

macarte commented Mar 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

macarte commented Mar 9, 2026

Uh oh!

macarte commented Mar 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants