Skip to content

Commit 19040ed

Browse files
Internal monitor impl not using coop mutex causing deadlocks on Android. (#112373)
Backport of #112358 to release/9.0 On Android we have seen ANR issues, like the one described in #111485. After investigating several different dumps including all threads it turns out that we could end up in a deadlock when init a monitor since that code path didn't use a coop mutex and owner of lock could end up in GC code while holding that lock, leading to deadlock if another thread was about to lock the same monitor init lock. In several dumps we see the following two threads: Thread 1: syscall+28 __futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+14 NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*)+384 sgen_gc_lock+105 mono_gc_wait_for_bridge_processing_internal+70 sgen_gchandle_get_target+288 alloc_mon+358 ves_icall_System_Threading_Monitor_Monitor_wait+452 ves_icall_System_Threading_Monitor_Monitor_wait_raw+583 Thread 2: syscall+28 __futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+144 NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*)+652 alloc_mon+105 ves_icall_System_Threading_Monitor_Monitor_wait+452 ves_icall_System_Threading_Monitor_Monitor_wait_raw+583 So in this scenario Thread 1 holds monitor_mutex that is not a coop mutex and end up trying to take GC lock, since it calls, mono_gc_wait_for_bridge_processing_internal, but since a GC is already started (waiting on STW to complete), Thread 1 will block holding monitor_mutex. Thread 2 will try to lock monitor_mutex as well, and since its not a coop mutex it will block on OS __futex_wait_ex without changing Mono thread state to blocking, preventing the STW from processing. Fix is to switch to coop aware implementation of monitor_mutex. Normally this should have been resolved on Android since we run hybrid suspend meaning we should be able to run a signal handler on the blocking thread that would suspend it meaning that STW would continue, but for some reason the signal can't have been executed in this case putting the app under coop suspend limitations. This fix will take care of the deadlock, but if there are issues running Signals on Android, then threads not attached to runtime using coop attach methods could end up in similar situations blocking STW. Co-authored-by: lateralusX <[email protected]>
1 parent dd88d29 commit 19040ed

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

src/mono/mono/metadata/monitor.c

+4-4
Original file line numberDiff line numberDiff line change
@@ -82,9 +82,9 @@ struct _MonitorArray {
8282
MonoThreadsSync monitors [MONO_ZERO_LEN_ARRAY];
8383
};
8484

85-
#define mono_monitor_allocator_lock() mono_os_mutex_lock (&monitor_mutex)
86-
#define mono_monitor_allocator_unlock() mono_os_mutex_unlock (&monitor_mutex)
87-
static mono_mutex_t monitor_mutex;
85+
#define mono_monitor_allocator_lock() mono_coop_mutex_lock (&monitor_mutex)
86+
#define mono_monitor_allocator_unlock() mono_coop_mutex_unlock (&monitor_mutex)
87+
static MonoCoopMutex monitor_mutex;
8888
static MonoThreadsSync *monitor_freelist;
8989
static MonitorArray *monitor_allocated;
9090
static int array_size = 16;
@@ -255,7 +255,7 @@ lock_word_new_flat (gint32 owner)
255255
void
256256
mono_monitor_init (void)
257257
{
258-
mono_os_mutex_init_recursive (&monitor_mutex);
258+
mono_coop_mutex_init_recursive (&monitor_mutex);
259259
}
260260

261261
static int

0 commit comments

Comments
 (0)