Skip to content

ForkJoinPool carrier starvation compensation for pinned virtual threads#52

Open
macarte wants to merge 1 commit intomacarte/PR3-winarm64from
macarte/PR4-winarm64
Open

ForkJoinPool carrier starvation compensation for pinned virtual threads#52
macarte wants to merge 1 commit intomacarte/PR3-winarm64from
macarte/PR4-winarm64

Conversation

@macarte
Copy link

@macarte macarte commented Mar 9, 2026

When a pinned virtual thread's carrier blocks on a contended monitor, the ForkJoinPool has no visibility that its worker is blocked. If all carriers end up pinned on monitors, the VT holding the contested lock can never get a carrier to run on, causing deadlock.

This adds FJP compensation that activates an idle worker or creates a spare carrier when a pinned carrier is about to block on a monitor:

objectMonitor.cpp:

  • compensate_pinned_carrier(): calls CarrierThread.beginMonitorBlock() before the carrier blocks, saving/restoring pending monitor state to handle nested monitor contention during worker thread creation
  • end_compensate_pinned_carrier(): calls CarrierThread.endBlocking() after the carrier acquires the monitor
  • New includes: symbolTable.hpp, javaCalls.hpp

ForkJoinPool.java:

  • tryCompensateForMonitor(): variant of tryCompensate that omits the passive RC-only path (branch 2) and explicitly decrements RC so signalWork sees the pool as under-active while carriers are blocked
  • beginMonitorCompensatedBlock(): spin-loop wrapper returning the RC_UNIT restoration value for endCompensatedBlock
  • endCompensatedBlock: updated javadoc for monitor compensation path

JavaUtilConcurrentFJPAccess.java:

  • Added beginMonitorCompensatedBlock to the shared secret interface

CarrierThread.java:

  • beginMonitorBlock(): re-entrancy-guarded compensation that pins the continuation during FJP compensation to prevent preemption
  • ForkJoinPools.beginMonitorCompensatedBlock(): static bridge method

When a pinned virtual thread's carrier blocks on a contended monitor,
the ForkJoinPool has no visibility that its worker is blocked. If all
carriers end up pinned on monitors, the VT holding the contested lock
can never get a carrier to run on, causing deadlock.

This adds FJP compensation that activates an idle worker or creates a
spare carrier when a pinned carrier is about to block on a monitor:

objectMonitor.cpp:
  - compensate_pinned_carrier(): calls CarrierThread.beginMonitorBlock()
    before the carrier blocks, saving/restoring pending monitor state to
    handle nested monitor contention during worker thread creation
  - end_compensate_pinned_carrier(): calls CarrierThread.endBlocking()
    after the carrier acquires the monitor
  - New includes: symbolTable.hpp, javaCalls.hpp

ForkJoinPool.java:
  - tryCompensateForMonitor(): variant of tryCompensate that omits the
    passive RC-only path (branch 2) and explicitly decrements RC so
    signalWork sees the pool as under-active while carriers are blocked
  - beginMonitorCompensatedBlock(): spin-loop wrapper returning the
    RC_UNIT restoration value for endCompensatedBlock
  - endCompensatedBlock: updated javadoc for monitor compensation path

JavaUtilConcurrentFJPAccess.java:
  - Added beginMonitorCompensatedBlock to the shared secret interface

CarrierThread.java:
  - beginMonitorBlock(): re-entrancy-guarded compensation that pins the
    continuation during FJP compensation to prevent preemption
  - ForkJoinPools.beginMonitorCompensatedBlock(): static bridge method
@macarte
Copy link
Author

macarte commented Mar 11, 2026

manual rerun results

Here's the PR4 rerun summary (run 22935801854): 10 failures across 85 jobs

  Windows-aarch64 failures (3 jobs, 5 errors):

  ┌──────────────────────────┬─────────────────────────────────────────┬──────────────────────────────────────┐
  │ Job                      │ Test                                    │ Error                                │
  ├──────────────────────────┼─────────────────────────────────────────┼──────────────────────────────────────┤
  │ jdk/tier1 part 1         │ PingPong.java#ltq                       │ Timeout 480s (stuck at 2447/500000)  │
  ├──────────────────────────┼─────────────────────────────────────────┼──────────────────────────────────────┤
  │                          │ PingPong.java#sq                        │ Timeout 480s (stuck at 10875/500000) │
  ├──────────────────────────┼─────────────────────────────────────────┼──────────────────────────────────────┤
  │                          │ Skynet.java#default                     │ Timeout 6400s                        │
  ├──────────────────────────┼─────────────────────────────────────────┼──────────────────────────────────────┤
  │ hs/tier1 serviceability  │ GetStackTraceSuspendedStressTest.java   │ Timeout 480s                         │
  ├──────────────────────────┼─────────────────────────────────────────┼──────────────────────────────────────┤
  │ hs/tier1 compiler part 3 │ TestSpeculationFailedHigherEqual.java   │ Failed                               │
  └──────────────────────────┴─────────────────────────────────────────┴──────────────────────────────────────┘

  Cross-platform failures (7 jobs — all pre-existing):

  ┌─────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────┐
  │ Test                                    │ Platforms                                                                   │
  ├─────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
  │ TestSpeculationFailedHigherEqual.java   │ All 5 platforms (linux-x64, linux-x64-static, macos-aarch64, windows-x64,   │
  │                                         │ windows-aarch64)                                                            │
  ├─────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
  │ TestSharedCloseJFR.java                 │ linux-x64, linux-x64-static, macos-aarch64                                  │
  └─────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────┘

  Key observations:

   - Same win-aarch64 virtual thread timeouts as PR2/PR3 — expected since PR4 doesn't add Dekker fences (that's PR5)
   - TestSpeculationFailedHigherEqual.java continues failing on all platforms — this is a tip-wide regression, not related 
  to your changes
   - TestSharedCloseJFR.java is the same persistent flaky test
   - jdk/tier1 part 2 passed this time (no SPMC timeout)
   - The GetStackTraceSuspendedStressTest timeout reappeared (was seen in PR #51 too)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant