ANRs in SyncService.start/stop
#4150
Labels
A-Performance
O-Occasional
Affects or can be seen by some users regularly or most users rarely
S-Major
Severely degrades major functionality or product features, with no satisfactory workaround
T-Defect
Something isn't working: bugs, crashes, hangs and other reported problems
We recently became aware of an abnormal growth of ANR (application not responding) issues, and the most common one seems to be one with the next stacktrace:
Stacktrace
Which points to a
SyncService.stop()
call being shortly followed by aSyncService.start()
one. This only happens at 3 points in the app:LoggedInFlowNode
won't start the sync service when needed and we need to manually do it here. When the call is finished and the screen removed, the service will be stopped.The implementation of the flows above aren't perfect, and we may be calling
SyncService.start/stop
more than needed, which may be one of the root causes of this issue... centralising all these calls in a single component would probably make more sense.The calls to these methods are all done in a separate coroutine, but all of them running on the
Main
dispatcher, which means if the underlying thread gets blocked somehow, the whole app freezes since it's the same thread that runs the UI. While this shouldn't happen because UniFFI seems to translateasync
methods to suspending functions, it seems to be happening somehow.I wasn't able to reproduce this issue without a little help: I had to artificially add some delay inside
SyncService.start()
on the SDK so the app froze when I foregrounded/backgrounded it several times, but I imagine the real env would be similar. With these changes to force more frequent app freezes, the issue could be mitigated by running theSyncService.stop()
calls in a background dispatcher (surprisingly, doing the same forSyncService.start()
had no effect AFAICT): I'm not sure if this meant we now had a blocked background thread or if by being on background this meant the apparent deadlock we saw before just couldn't happen.As for what might have caused this, the issue apparently started at v0.7.6, and the related SDK changes are these if I'm not mistaken. Nothing seems really related to this other than this PR in EXA but that doesn't seem to be the cause, so maybe it's something coming from a UniFFI internal change.
The text was updated successfully, but these errors were encountered: