Skip to content

ARM64 currentCarrierThread intrinsic MO_ACQUIRE + @DontInline on Continuation.yield()#51

Open
macarte wants to merge 1 commit intomacarte/PR2-winarm64from
macarte/PR3-winarm64
Open

ARM64 currentCarrierThread intrinsic MO_ACQUIRE + @DontInline on Continuation.yield()#51
macarte wants to merge 1 commit intomacarte/PR2-winarm64from
macarte/PR3-winarm64

Conversation

@macarte
Copy link

@macarte macarte commented Mar 9, 2026

ARM64 currentCarrierThread intrinsic MO_ACQUIRE + @DontInline on Continuation.yield()

On ARM64, after a virtual thread migrates between carriers, the OopHandle dereference for currentCarrierThread can be reordered with subsequent dependent loads (e.g. Thread.cont), observing stale data from the previous carrier. Add MO_ACQUIRE semantics to the OopHandle load in both the C1 (do_JavaThreadField) and C2 (current_thread_helper) intrinsics to ensure ordering.

In C1, the barrier set's load_at_resolved did not honor MO_ACQUIRE for the membar_acquire emission — extend the check to cover both MO_SEQ_CST and MO_ACQUIRE decorators.

Add @DontInline on Continuation.yield() and a @ChangesCurrentThread inlining guard in C1's should_not_inline() to prevent the compiler from inlining across carrier-change boundaries, which could allow the currentCarrierThread value to be cached across a yield point.

This is a shared ARM64 correctness fix (not Windows-specific).

…inuation.yield()

On ARM64, after a virtual thread migrates between carriers, the OopHandle
dereference for currentCarrierThread can be reordered with subsequent
dependent loads (e.g. Thread.cont), observing stale data from the previous
carrier. Add MO_ACQUIRE semantics to the OopHandle load in both the C1
(do_JavaThreadField) and C2 (current_thread_helper) intrinsics to ensure
ordering.

In C1, the barrier set's load_at_resolved did not honor MO_ACQUIRE for
the membar_acquire emission — extend the check to cover both MO_SEQ_CST
and MO_ACQUIRE decorators.

Add @DontInline on Continuation.yield() and a @ChangesCurrentThread
inlining guard in C1's should_not_inline() to prevent the compiler from
inlining across carrier-change boundaries, which could allow the
currentCarrierThread value to be cached across a yield point.

This is a shared ARM64 correctness fix (not Windows-specific).
@macarte macarte changed the title PR3 ARM64 currentCarrierThread intrinsic MO_ACQUIRE + @DontInline on Continuation.yield() Mar 9, 2026
@macarte
Copy link
Author

macarte commented Mar 9, 2026

results (not sure why some cancelled)

  ┌───┬──────────────────────┬───────────────────────────────────┬───────────────────────────────────────────────┬──────────────────┐
  │ # │ Platform             │ Test Group                        │ Test                                          │ Error            │
  ├───┼──────────────────────┼───────────────────────────────────┼───────────────────────────────────────────────┼──────────────────┤
  │ 1 │ windows-aarch64      │ hs/tier1 serviceability (2 err /  │ GetStackTraceNotSuspendedStressTest.java      │ Timeout (480s)   │
  │   │                      │ 407 total)                        │                                               │                  │
  ├───┼──────────────────────┼───────────────────────────────────┼───────────────────────────────────────────────┼──────────────────┤
  │ 2 │ windows-aarch64      │ hs/tier1 serviceability           │ GetStackTraceSuspendedStressTest.java         │ Timeout (480s)   │
  ├───┼──────────────────────┼───────────────────────────────────┼───────────────────────────────────────────────┼──────────────────┤
  │ 3 │ windows-aarch64      │ jdk/tier1 part 2 (2 err / 1014    │ MultipleProducersSingleConsumerLoops.java     │ Timeout (480s)   │
  │   │                      │ total)                            │                                               │                  │
  ├───┼──────────────────────┼───────────────────────────────────┼───────────────────────────────────────────────┼──────────────────┤
  │ 4 │ windows-aarch64      │ jdk/tier1 part 2                  │ ProducerConsumerLoops.java                    │ Timeout (480s)   │
  ├───┼──────────────────────┼───────────────────────────────────┼───────────────────────────────────────────────┼──────────────────┤
  │ 5 │ linux-x64            │ hs/tier1 compiler part 3 (1 fail  │ TestReturnOopSetForJFRWriteCheckpoint.java    │ IR framework     │
  │   │                      │ / 786 total)                      │                                               │ exception        │
  ├───┼──────────────────────┼───────────────────────────────────┼───────────────────────────────────────────────┼──────────────────┤
  │ 6 │ linux-x64            │ jdk/tier1 part 3                  │ TestSharedCloseJFR.java                       │ Exit code 1      │
  ├───┼──────────────────────┼───────────────────────────────────┼───────────────────────────────────────────────┼──────────────────┤
  │ 7 │ linux-x64-static     │ hs/tier1 compiler part 3          │ TestReturnOopSetForJFRWriteCheckpoint.java    │ IR framework     │
  │   │                      │                                   │                                               │ exception        │
  ├───┼──────────────────────┼───────────────────────────────────┼───────────────────────────────────────────────┼──────────────────┤
  │ 8 │ linux-x64-static     │ jdk/tier1 part 3                  │ TestSharedCloseJFR.java                       │ Exit code 1      │
  ├───┼──────────────────────┼───────────────────────────────────┼───────────────────────────────────────────────┼──────────────────┤
  │ 9 │ macos-aarch64        │ jdk/tier1 part 3                  │ TestSharedCloseJFR.java                       │ Exit code 1      │
  └───┴──────────────────────┴───────────────────────────────────┴───────────────────────────────────────────────┴──────────────────┘

  Still running: windows-aarch64 jdk/tier1 part 1, hs/tier1 compiler part 3, plus most windows-x64 and some macos-aarch64 jobs.

  Key observations:

   - Tests #1–4 are windows-aarch64 specific — these are the Dekker/ordering timeouts we've seen in prior PRs (PR #48, #49, #50).
  New: two JVMTI stress tests + two j.u.c.BlockingQueue tests (replacing SingleProducerMultipleConsumerLoops from before)
   - Tests #5–9 are cross-platform (linux, macos, not windows) and likely pre-existing/flaky — TestReturnOopSetForJFRWriteCheckpoint
  and TestSharedCloseJFR are not related to the ARM64 changes

@macarte
Copy link
Author

macarte commented Mar 11, 2026

manual rerun results

 PR3 Rerun (run 22925466417): 10 failures across 85 jobs

  Windows-aarch64 failures (3 jobs):

  ┌───────────────────────────────────┬──────────────────────────────────────────────┬───────────────────────────┐
  │ Job                               │ Test                                         │ Error                     │
  ├───────────────────────────────────┼──────────────────────────────────────────────┼───────────────────────────┤
  │ jdk/tier1 part 1 (3 errors)       │ PingPong.java#ltq                            │ Timeout 480s              │
  ├───────────────────────────────────┼──────────────────────────────────────────────┼───────────────────────────┤
  │                                   │ PingPong.java#sq                             │ Timeout 480s              │
  ├───────────────────────────────────┼──────────────────────────────────────────────┼───────────────────────────┤
  │                                   │ Skynet.java#default                          │ Timeout 6400s             │
  ├───────────────────────────────────┼──────────────────────────────────────────────┼───────────────────────────┤
  │ jdk/tier1 part 2 (1 error)        │ Likely SingleProducerMultipleConsumerLoops   │ Timeout (beyond log tail) │
  ├───────────────────────────────────┼──────────────────────────────────────────────┼───────────────────────────┤
  │ hs/tier1 compiler part 3 (1 fail) │ TestSpeculationFailedHigherEqual.java        │ Failed                    │
  └───────────────────────────────────┴──────────────────────────────────────────────┴───────────────────────────┘

  Cross-platform failures (7 jobs, all pre-existing flaky):

  ┌─────────────────────────────────────────┬─────────────────────────────────────────────────────────┐
  │ Test                                    │ Platforms                                               │
  ├─────────────────────────────────────────┼─────────────────────────────────────────────────────────┤
  │ TestSpeculationFailedHigherEqual.java   │ linux-x64, linux-x64-static, macos-aarch64, windows-x64 │
  ├─────────────────────────────────────────┼─────────────────────────────────────────────────────────┤
  │ TestSharedCloseJFR.java                 │ linux-x64, linux-x64-static, macos-aarch64              │
  └─────────────────────────────────────────┴─────────────────────────────────────────────────────────┘

  Key observations:

   - The win-aarch64 virtual thread timeouts (PingPong, Skynet, SPMC) are the same as PR2 — expected since PR3 doesn't add 
  Dekker fences
   - TestSpeculationFailedHigherEqual.java is new — failing on ALL 5 platforms, suggesting a tip-wide regression unrelated 
  to your changes
   - TestSharedCloseJFR.java is the same pre-existing flaky test we've seen throughout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant