[GR-67169] Address JFR follow-up review comments#13440
[GR-67169] Address JFR follow-up review comments#13440graalvmbot wants to merge 4 commits intomasterfrom
Conversation
|
@roberttoyonaga Hi Robert! This should address some of the follow-up review comments I received for the emergency dump changes. It now also narrows the semantics to do the dump only if the OOM exception is not caught. |
roberttoyonaga
left a comment
There was a problem hiding this comment.
Thanks @thomaswue! I've left a few comments below.
It now also narrows the semantics to do the dump only if the OOM exception is not caught.
This requires the user to set ReportFatalErrorOnOutOfMemoryError or ExitOnOutOfMemoryError. Does it make sense to attempt an emergency dump if an uncaught OOME reaches JavaThreads.dispatchUncaughtException?
| } | ||
|
|
||
| @Uninterruptible(reason = "Locking without transition requires that the whole critical section is uninterruptible.") | ||
| public int writePreviousEpoch(JfrChunkWriter writer) { |
There was a problem hiding this comment.
Do we need to duplicate int write(JfrChunkWriter writer, boolean flushpoint) here?
If so, maybe we can change write to writeCurrentEpoch(JfrChunkWriter writer)
| epochData.buffer = JfrBufferAccess.allocate(JfrBufferType.C_HEAP); | ||
| } | ||
| @Uninterruptible(reason = "Locking without transition and result is only valid until epoch changes.", callerMustBe = true) | ||
| private long getSymbolIdFromTemporaryBuffer(Pointer buffer, UnsignedWord length, boolean previousEpoch) { |
There was a problem hiding this comment.
Nice optimization to avoid needing to malloc for duplicates!
| /* | ||
| * Threads only need to be registered once per epoch. However, the cached epoch id alone is | ||
| * not sufficient because repository resets at stop/emergency cleanup can wipe the current | ||
| * epoch state without clearing the virtual thread's cached id. | ||
| */ |
There was a problem hiding this comment.
This is a good catch. If JFR is stopped then started again, the epoch ID and jfrEpochId of surviving threads may still match - incorrectly bypassing re-registration.
I don't think "emergency cleanup" is a problem anymore since you are removing that code in this PR.
However, the purpose of isVirtualThreadAlreadyRegistered is to have a fast path that avoids taking the lock if the vthread is already registered. Hotspot does this too. Introduced in #9570. This correction negates that benefit since we need to take the lock.
Maybe a solution is to bump the epoch with JfrTraceIdEpoch.getInstance().changeEpoch() in the JfrBeginRecordingOperation VM operation.
| } | ||
|
|
||
| if (SubstrateGCOptions.ExitOnOutOfMemoryError.getValue()) { | ||
| if (LibC.isSupported()) { |
There was a problem hiding this comment.
Why remove dumpJfrOnOutOfMemoryError() from here now?
There was a problem hiding this comment.
This is to match HotSpot behavior. Our tests and research indicate that HotSpot does not do that dump when -XX:+ExitOnOutOfMemoryError is enabled. There is however indeed a general design question how close we want to match HotSpot's behavior. Or if we want to provide additional functionality.
There was a problem hiding this comment.
I'm fine with matching hotspot as closely as possible. Hotspot also does a JFR emergency dump upon VM crashes in VMError::report_and_die. I wonder if it's possible to do an emergency dump from VMError.shouldNotReachHere to achieve something similar. Although, it may not be possible since emergency dumps require a safepoint and we don't know what state the VM is in.
On the other hand, if we are okay not exactly matching Hotspot, I do think it could be useful generating an emergency dump if ExitOnOutOfMemoryError or if an OOME error goes uncaught by the application code. The user has already included JFR in the image build and JFR would already need to be running anyway.
There was a problem hiding this comment.
Yes, I think we might want to offer more options than HotSpot, at least if it does not make the implementation overly complex. @christianhaeubl What is your opinion on the longer term plans?
There was a problem hiding this comment.
We can offer additional options. However by default, I think Native Image should behave in the same way as HotSpot as differences can cause problems for people that want to adopt Native Image.
I wonder if it's possible to do an emergency dump from VMError.shouldNotReachHere to achieve something similar
It should be possible, but it will definitely require some work as we would want to skip JFR emergency dumping in some cases (e.g., the crash corrupted some VM-internal data structure, a recursive crash happens during JFR emergency dumping, the process is out-of-memory on the OS-level). As HotSpot has pretty much the same problem, I assume that they already implemented some heuristic.
There was a problem hiding this comment.
However by default, I think Native Image should behave in the same way as HotSpot as differences can cause problems for people that want to adopt Native Image
Okay, let's match Hotspot and not dump on ExitOnOutOfMemoryError or uncaught OOME.
As HotSpot has pretty much the same problem, I assume that they already implemented some heuristic.
Yes, that's right. Hotspot does a bunch of things to try to increase the likelihood of the dump succeeding on VM crash, but acknowledges it's still only a best effort. We can decide later whether to do this in a future enhancement.
roberttoyonaga
left a comment
There was a problem hiding this comment.
Thanks @thomaswue! Please see my comment above about matching Hotspot.
I do think it would be useful to users if we extending the conditions in which we dump, but I understand the value in matching hotspot as closely as possible. Overall, I am okay with either decision.
| } | ||
|
|
||
| @Uninterruptible(reason = "Locking without transition and result is only valid until epoch changes.", callerMustBe = true) | ||
| public long getSymbolId(String imageHeapString, boolean previousEpoch, boolean replaceDotWithSlash) { |
There was a problem hiding this comment.
I notice that getSymbolId(String imageHeapString) is only called from JfrMethodRepository now. JfrTypeRepository only calls getSymbolId(Pointer buffer). Is this because, with Crema, not all type Strings will be on the image heap, but method names will be?
There was a problem hiding this comment.
Actually, I think it doesn't matter since the JfrSymbolRepository no longer pins any objects in RawStrucutres. The behaviour is the same regardless of whether the String is on the image heap (we always convert to a native utf-8 buffer instead of pinning a String object). So I think that the parameter name imageHeapString can be misleading. And we should no longer have the assert on line 95.
Additionally, if this is true, we can probably optimize things by calling JfrSymbolRepository.getSymbolId(String imageHeapString, boolean previousEpoch, boolean replaceDotWithSlash) directly from JfrTypeRepository instead of converting to a buffer there first. This would help us bypass malloc on the hot path and reduce code duplication. ie I think we can get rid of JfrTypeRepository.getSymbolId(String symbol, boolean previousEpoch, boolean replaceDotWithSlash)
No description provided.