You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think many people use Aeron in latency-sensitive applications. And it's constant issue to profile rare events which contribute to a tail latency a lot but doesn't affect average profile significantly.
I have an idea how to address this issue. Let's imagine Aeron's event loop writes current timestamp (heartbeat) to some memory-mapped file (heartbeat file) at the beginnig of every poll. Next, it's possible to hack some profiler to read this timestamp (via memory mapped file as well) and record samples if only current timestamp (sample's timestamp) is significantly greater than a timestamp of poll start. If poll has started recently, just skip such sample. In general, it's similar for 'time-to-safepoint' profiling.
I've implemented such thing for async-profiler (not upstreamed this yet).
Do you think such approach helps profiling tail latency issues? What do you think if I contribute such think (heartbeat agent) for Agrona/Aeron? I'm finishing PR to async-profiler and feedback from Aeron is highly valuable as well.
The text was updated successfully, but these errors were encountered:
@RainM Are you aware about the duty cycle stall tracking that is implemented for all Aeron components? There are two counters per component: one tracks the max duty cycle time ever recorded whereas the other one contains the number of times a threshold has been breached (e.g. MediaDriver.Context#senderCycleThresholdNs() ).
Hello,
I think many people use Aeron in latency-sensitive applications. And it's constant issue to profile rare events which contribute to a tail latency a lot but doesn't affect average profile significantly.
I have an idea how to address this issue. Let's imagine Aeron's event loop writes current timestamp (heartbeat) to some memory-mapped file (heartbeat file) at the beginnig of every poll. Next, it's possible to hack some profiler to read this timestamp (via memory mapped file as well) and record samples if only current timestamp (sample's timestamp) is significantly greater than a timestamp of poll start. If poll has started recently, just skip such sample. In general, it's similar for 'time-to-safepoint' profiling.
I've implemented such thing for async-profiler (not upstreamed this yet).
Do you think such approach helps profiling tail latency issues? What do you think if I contribute such think (heartbeat agent) for Agrona/Aeron? I'm finishing PR to async-profiler and feedback from Aeron is highly valuable as well.
The text was updated successfully, but these errors were encountered: