Skip to content

Conversation

noahsmartin
Copy link
Contributor

@noahsmartin noahsmartin commented Sep 22, 2025

This introduces a new HangTracker class that observers the run loop and notifies anything registered with it when the runloop is late. The idea is to eventually replace all uses of CADisplayLink with this. They are not exactly the same, display link works on vsync events and a runloop observer hooks directly into the runloop lifecycle.

In this PR I use the new HangTracker in the watchdog integration. This seemed like a pretty small use of display links and was a good opportunity to have a first crack at removing them. I also don't think this change requires any kind of new opt-in options, since it's all internal to how the integration works.

The SDK default behavior before this change is to enable the watchdog integration, which means by default the SDK spins the CPU in the background constantly waking up to process these vsync events. Since we are in the final stretch of quality quarter I wanted to get rid of this behavior and replace it with a passive observer, like this.

A few interesting implementation details:

  • The hang tracker turns itself on and off by removing the runloop observer when nothing is subscribing to changes anymore.
  • The majority of the code is Swift only, with a small ObjcBridge class wrapping the main implementation
  • I used protocol based decency injection rather than subclassing, since this code can now be fully "swifty"

I also manually tested in the swift sample app by making sure when the debugger wasn't attached watchdog events still get sent when I force-quite the app with Xcode. Also verified that if I simulate a hang with the sample app and then force quit while the hang is in progress, watchdog events are not sent.

Closes #6252

@noahsmartin noahsmartin force-pushed the hangTrackerAbstraction branch from db41d1f to c7cfc40 Compare September 22, 2025 19:19
Copy link

codecov bot commented Sep 22, 2025

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
3980 2 3978 11
View the top 3 failed test(s) by shortest run time
SentryTests.SentryANRTrackerV2Tests::testFullyBlockingFollowedByFullyBlocking_BothReported
Stack Traces | 0s run time
.../Integrations/ANR/SentryANRTrackerV2Tests.swift:276 - Asynchronous wait failed: Exceeded timeout of 10 seconds, with unfulfilled expectations: "Test Delegate ANR Detection".
SentryTests.SentrySpotlightTransportTests::testShouldRemoveAttachmentsFromEventEnvelope
Stack Traces | 0s run time
.../SentryTests/Networking/SentrySpotlightTransportTests.swift:106 - XCTAssertTrue failed - Expected body size to be in range of 1098...1138, but was 1139
iOS_Swift_UITests.UserFeedbackUITests::testAddingScreenshots
Stack Traces | 0s run time
.../iOS-Swift/iOS-Swift-UITests/UserFeedbackUITests.swift:670 - XCTUnwrap failed: expected non-nil value of type "String"
View the full list of 1 ❄️ flaky test(s)
SentryTests.SentrySDKInternalTests::testResumeAndPauseAppHangTracking

Flake rate in main: 14.00% (Passed 43 times, Failed 7 times)

Stack Traces | 0s run time
.../Tests/SentryTests/SentrySDKInternalTests.swift:632 - XCTAssertEqual failed: ("1") is not equal to ("0") - The SDK should capture an AppHang after resuming the tracking, but it didn't.

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@noahsmartin noahsmartin force-pushed the hangTrackerAbstraction branch 3 times, most recently from 29cd58b to 418cf15 Compare September 22, 2025 20:07
Copy link
Contributor

github-actions bot commented Sep 22, 2025

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1236.58 ms 1265.17 ms 28.59 ms
Size 23.75 KiB 996.49 KiB 972.75 KiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
1fecbb8 1242.78 ms 1265.40 ms 22.62 ms
d05d866 1211.78 ms 1230.96 ms 19.18 ms
32e7197 1226.91 ms 1245.48 ms 18.56 ms
0e9c5ae 1226.10 ms 1254.14 ms 28.04 ms
397b9c9 1230.23 ms 1249.29 ms 19.06 ms
718c372 1220.09 ms 1235.15 ms 15.06 ms
45482a6 1225.88 ms 1254.27 ms 28.39 ms
f97a070 1218.88 ms 1253.12 ms 34.24 ms
5846fc5 1223.25 ms 1248.79 ms 25.54 ms
43597ba 1214.88 ms 1243.52 ms 28.65 ms

App size

Revision Plain With Sentry Diff
1fecbb8 23.75 KiB 969.28 KiB 945.53 KiB
d05d866 23.75 KiB 878.60 KiB 854.85 KiB
32e7197 23.75 KiB 866.69 KiB 842.94 KiB
0e9c5ae 23.75 KiB 969.29 KiB 945.54 KiB
397b9c9 23.75 KiB 959.44 KiB 935.70 KiB
718c372 23.75 KiB 920.65 KiB 896.90 KiB
45482a6 23.75 KiB 919.91 KiB 896.16 KiB
f97a070 23.75 KiB 858.68 KiB 834.93 KiB
5846fc5 23.75 KiB 912.77 KiB 889.02 KiB
43597ba 23.75 KiB 880.32 KiB 856.58 KiB

Previous results on branch: hangTrackerAbstraction

Startup times

Revision Plain With Sentry Diff
ac854e2 1220.81 ms 1250.20 ms 29.39 ms
1e7ad41 1245.79 ms 1280.67 ms 34.88 ms
547ce2a 1226.90 ms 1257.82 ms 30.92 ms
fee2eea 1225.57 ms 1243.82 ms 18.25 ms

App size

Revision Plain With Sentry Diff
ac854e2 23.75 KiB 995.72 KiB 971.97 KiB
1e7ad41 23.75 KiB 994.33 KiB 970.58 KiB
547ce2a 23.75 KiB 995.83 KiB 972.08 KiB
fee2eea 23.75 KiB 996.53 KiB 972.78 KiB

@noahsmartin noahsmartin changed the title Hang tracker abstraction ref: Use RunLoop observer for watchdog detection Sep 23, 2025
@noahsmartin noahsmartin force-pushed the hangTrackerAbstraction branch 7 times, most recently from d2065b3 to ec5403e Compare September 23, 2025 23:06
Comment on lines +76 to +88
func removeLateRunLoopObserver(id: UUID) {
queue.async { [weak self] in
guard let self else { return }
lateRunLoop.removeValue(forKey: id)
if lateRunLoop.isEmpty {
DispatchQueue.main.async { [weak self] in
if self?.finishedRunLoop.isEmpty ?? false {
self?.stop()
}
}
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: Inconsistent threading in observer management creates race conditions. addLateRunLoopObserver uses async dispatch while addFinishedRunLoopObserver is synchronous, leading to potential lifecycle issues.
  • Description: The HangTracker class exhibits inconsistent threading patterns for managing observers. The addLateRunLoopObserver method uses a background queue and then dispatches back to the main queue, while addFinishedRunLoopObserver modifies state directly on the main thread. This inconsistency, coupled with complex removal logic involving cross-queue state checks (e.g., lateRunLoop.isEmpty, finishedRunLoop.isEmpty), creates race conditions. These races can lead to the runloop observer not being stopped when all observers are removed, causing resource leaks, or being stopped prematurely, which would disable hang detection.

  • Suggested fix: Ensure all observer management operations are synchronized. Either perform all modifications to lateRunLoop and finishedRunLoop on the main thread directly, or consistently dispatch all add/remove operations to the private queue. This will eliminate race conditions caused by cross-queue state checks and inconsistent execution contexts.
    severity: 0.7, confidence: 0.9

Did we get this right? 👍 / 👎 to inform future reviews.

}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Race Conditions in HangTracker's Thread Safety

Race conditions exist in HangTracker due to inconsistent thread safety. add/removeFinishedRunLoopObserver access finishedRunLoop directly, which is unsafe as the dictionary is also accessed on the main thread. Additionally, the remove methods' logic for stopping the tracker involves multiple asynchronous dispatches to check finishedRunLoop and lateRunLoop emptiness, creating a race that can lead to incorrect start/stop decisions.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ref: Use RunLoop observer for watchdog detection
1 participant