You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Internal monitor impl not using coop mutex causing deadlocks on Android. (#112374)
Backport of #112358 to release/8.0
On Android we have seen ANR issues, like the one described in
#111485. After investigating
several different dumps including all threads it turns out that we
could end up in a deadlock when init a monitor since that code path
didn't use a coop mutex and owner of lock could end up in GC code
while holding that lock, leading to deadlock if another thread was
about to lock the same monitor init lock. In several dumps we see
the following two threads:
Thread 1:
syscall+28
__futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+14
NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*)+384
sgen_gc_lock+105
mono_gc_wait_for_bridge_processing_internal+70
sgen_gchandle_get_target+288
alloc_mon+358
ves_icall_System_Threading_Monitor_Monitor_wait+452
ves_icall_System_Threading_Monitor_Monitor_wait_raw+583
Thread 2:
syscall+28
__futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+144
NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*)+652
alloc_mon+105
ves_icall_System_Threading_Monitor_Monitor_wait+452
ves_icall_System_Threading_Monitor_Monitor_wait_raw+583
So in this scenario Thread 1 holds monitor_mutex that is not a coop
mutex and end up trying to take GC lock, since it calls,
mono_gc_wait_for_bridge_processing_internal, but since a GC is already
started (waiting on STW to complete), Thread 1 will block holding
monitor_mutex.
Thread 2 will try to lock monitor_mutex as well, and since its not a
coop mutex it will block on OS __futex_wait_ex without changing
Mono thread state to blocking, preventing the STW from processing.
Fix is to switch to coop aware implementation of monitor_mutex.
Normally this should have been resolved on Android since we run
hybrid suspend meaning we should be able to run a signal handler on the
blocking thread that would suspend it meaning that STW would continue,
but for some reason the signal can't have been executed in this case putting the app under coop suspend limitations.
This fix will take care of the deadlock, but if there are issues running
Signals on Android, then threads not attached to runtime using
coop attach methods could end up in similar situations
blocking STW.
Co-authored-by: lateralusX <[email protected]>
0 commit comments