Bug 44110 - deadlock in the thread pool shutdown
Summary: deadlock in the thread pool shutdown
Status: RESOLVED ANSWERED
Alias: None
Product: Runtime
Classification: Mono
Component: io-layer ()
Version: 4.6.0 (C8)
Hardware: PC Mac OS
: --- normal
Target Milestone: ---
Assignee: Ludovic Henry
URL:
Depends on:
Blocks:
 
Reported: 2016-09-08 13:33 UTC by Alan McGovern
Modified: 2017-02-08 20:42 UTC (History)
4 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED ANSWERED

Description Alan McGovern 2016-09-08 13:33:56 UTC
Wrench is triggering this shutdown deadlock reasonably frequently[0] using Mono JIT compiler version 4.6.0 (mono-4.6.0-branch/700a14a:

(lldb) thread backtrace all
* thread #1: tid = 0x1079728, 0x00007fff9395610a libsystem_kernel.dylib`__semwait_signal + 10, name = 'tid_a0b', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff9395610a libsystem_kernel.dylib`__semwait_signal + 10
    frame #1: 0x00007fff8524ad0f libsystem_c.dylib`nanosleep + 199
    frame #2: 0x00000001029300b2 mono64`mono_thread_info_sleep + 322
    frame #3: 0x000000010286fd3c mono64`mono_threadpool_ms_cleanup + 188
    frame #4: 0x000000010286601f mono64`mono_runtime_try_shutdown + 63
    frame #5: 0x000000010286caa1 mono64`mono_thread_manage + 609
    frame #6: 0x00000001027463e5 mono64`mono_main + 7525
    frame #7: 0x00000001026bfe84 mono64`main + 1156
    frame #8: 0x00000001026bf9f4 mono64`start + 52

If it can be reproed locally your best bet is to compile monodevelop and md-addins, then run this in a loop in a term:

$ XCODE_DEVELOPER_ROOT=/Applications/Xcode8-GM.app/Contents/Developer/ SDK_VERSION=10.0 make test-ios-sim

The tests should fail equally with any xcode version. If you use Xcode73 then drop SDK_VERSION to 9.3, etc.

[0]
https://wrench.internalx.com/Wrench/ViewLane.aspx?lane_id=607&host_id=387&revision_id=802324
https://wrench.internalx.com/Wrench/ViewLane.aspx?lane_id=607&host_id=387&revision_id=802257
Comment 1 Ludovic Henry 2016-09-08 13:42:33 UTC
https://github.com/mono/mono/pull/3538 might give us more informations on what could go wrong.
Comment 2 Alan McGovern 2016-09-08 13:52:19 UTC
This should make the loop iterate faster:

https://gist.github.com/alanmcgovern/55cf5237c24cf9fd6992feeda476a2ab

however it doesn't hang on my machine :(
Comment 3 Alan McGovern 2016-09-08 14:01:15 UTC
Let me know when there's a mono build we can throw on wrench! thanks
Comment 4 Alan McGovern 2016-09-08 21:45:37 UTC
i triggered it locally with Mono JIT compiler version 4.6.0 (mono-4.6.0-branch/700a14a:

(lldb) thread backtrace all
* thread #1: tid = 0x669ca, 0x00007fff8542110a libsystem_kernel.dylib`__semwait_signal + 10, name = 'tid_a0b', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff8542110a libsystem_kernel.dylib`__semwait_signal + 10
    frame #1: 0x00007fff8752fd0f libsystem_c.dylib`nanosleep + 199
    frame #2: 0x000000010b5650b2 mono64`mono_thread_info_sleep(ms=<unavailable>, alerted=<unavailable>) + 322 at mono-threads.c:1269 [opt]
    frame #3: 0x000000010b4a4d3c mono64`mono_threadpool_ms_cleanup [inlined] cleanup + 65 at threadpool-ms.c:334 [opt]
    frame #4: 0x000000010b4a4cfb mono64`mono_threadpool_ms_cleanup [inlined] mono_lazy_cleanup + 107 at mono-lazy-init.h:113 [opt]
    frame #5: 0x000000010b4a4c90 mono64`mono_threadpool_ms_cleanup + 16 at threadpool-ms.c:1326 [opt]
    frame #6: 0x000000010b49b01f mono64`mono_runtime_try_shutdown + 63 at runtime.c:113 [opt]
    frame #7: 0x000000010b4a1aa1 mono64`mono_thread_manage + 609 at threads.c:3264 [opt]
    frame #8: 0x000000010b37b3e5 mono64`mono_main(argc=<unavailable>, argv=<unavailable>) + 7525 at driver.g.c:2187 [opt]
    frame #9: 0x000000010b2f4e84 mono64`main [inlined] mono_main_with_options(argc=9, argc=9, argc=9, argv=0x00007fff5490d370, argv=0x00007fff5490d370, argv=0x00007fff5490d370) + 17 at main.c:28 [opt]
    frame #10: 0x000000010b2f4e73 mono64`main(argc=9, argv=0x00007fff5490d370) + 1139 at main.c:177 [opt]
    frame #11: 0x000000010b2f49f4 mono64`start + 52

  thread #2: tid = 0x669cb, 0x00007fff85420db6 libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'SGen worker'
    frame #0: 0x00007fff85420db6 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff880c1728 libsystem_pthread.dylib`_pthread_cond_wait + 767
    frame #2: 0x000000010b53a9ba mono64`thread_func [inlined] mono_os_cond_wait(mutex=0x000000010b6a46e0) + 15 at mono-os-mutex.h:107 [opt]
    frame #3: 0x000000010b53a9ab mono64`thread_func(thread_data=0x0000000000000000) + 187 at sgen-thread-pool.c:110 [opt]
    frame #4: 0x00007fff880c099d libsystem_pthread.dylib`_pthread_body + 131
    frame #5: 0x00007fff880c091a libsystem_pthread.dylib`_pthread_start + 168
    frame #6: 0x00007fff880be351 libsystem_pthread.dylib`thread_start + 13

  thread #3: tid = 0x669cc, 0x00007fff8541afae libsystem_kernel.dylib`semaphore_wait_trap + 10, name = 'Finalizer'
    frame #0: 0x00007fff8541afae libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame #1: 0x000000010b4c9e61 mono64`finalizer_thread [inlined] mono_os_sem_wait(flags=MONO_SEM_FLAGS_ALERTABLE) + 11 at mono-os-semaphore.h:72 [opt]
    frame #2: 0x000000010b4c9e56 mono64`finalizer_thread [inlined] mono_coop_sem_wait(flags=MONO_SEM_FLAGS_ALERTABLE) + 11 at mono-coop-semaphore.h:40 [opt]
    frame #3: 0x000000010b4c9e4b mono64`finalizer_thread(unused=<unavailable>) + 171 at gc.c:757 [opt]
    frame #4: 0x000000010b4a4968 mono64`start_wrapper [inlined] start_wrapper_internal + 518 at threads.c:740 [opt]
    frame #5: 0x000000010b4a4762 mono64`start_wrapper(data=<unavailable>) + 50 at threads.c:788 [opt]
    frame #6: 0x000000010b567187 mono64`inner_start_thread(arg=<unavailable>) + 391 at mono-threads-posix.c:92 [opt]
    frame #7: 0x00007fff880c099d libsystem_pthread.dylib`_pthread_body + 131
    frame #8: 0x00007fff880c091a libsystem_pthread.dylib`_pthread_start + 168
    frame #9: 0x00007fff880be351 libsystem_pthread.dylib`thread_start + 13

  thread #4: tid = 0x669ce, 0x00007fff85421efa libsystem_kernel.dylib`kevent_qos + 10, queue = 'com.apple.libdispatch-manager'
    frame #0: 0x00007fff85421efa libsystem_kernel.dylib`kevent_qos + 10
    frame #1: 0x00007fff8b246165 libdispatch.dylib`_dispatch_mgr_invoke + 216
    frame #2: 0x00007fff8b245dcd libdispatch.dylib`_dispatch_mgr_thread + 52

  thread #5: tid = 0x67033, 0x00007fff8541afae libsystem_kernel.dylib`semaphore_wait_trap + 10, name = 'tid_2e17'
    frame #0: 0x00007fff8541afae libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame #1: 0x000000010b56711d mono64`inner_start_thread [inlined] mono_os_sem_wait(flags=MONO_SEM_FLAGS_NONE) + 14 at mono-os-semaphore.h:72 [opt]
    frame #2: 0x000000010b56710f mono64`inner_start_thread [inlined] mono_coop_sem_wait(flags=MONO_SEM_FLAGS_NONE) + 8 at mono-coop-semaphore.h:40 [opt]
    frame #3: 0x000000010b567107 mono64`inner_start_thread(arg=<unavailable>) + 263 at mono-threads-posix.c:85 [opt]
    frame #4: 0x00007fff880c099d libsystem_pthread.dylib`_pthread_body + 131
    frame #5: 0x00007fff880c091a libsystem_pthread.dylib`_pthread_start + 168
    frame #6: 0x00007fff880be351 libsystem_pthread.dylib`thread_start + 13
Comment 5 Alan McGovern 2016-09-09 08:25:53 UTC
I figured out why this is happening. This setup is this:

We start touchunit.server.exe. It is the iOS test harness. It does some setup and then starts mtouch. mtouch then does some more setup and launches the ios app containing all the tests. That app automatically exits when the tests have completed.

Touchunit.Server.exe monitors that apps PID to see when it completes and waits for an additional 10 seconds for the mtouch process to gracefully exit.If mtouch does not exit, TouchUnit.Server.exe prints a warning and returns from its main method anyway.

This is he case where it deadlocks. If I increase the 10 second timeout to be 30 seconds, the mtouch process always has enough time to clean up after the ios app exits and will gracefully shut down before TouchUnit.Server's wait timeout expires. I ran 10 builds on qrench with a 30 second timeout and nne triggered the deadlock.

[0] https://github.com/spouliot/Touch.Unit/commit/fdeb19b3aca24ba57fab3bd9c0cc43fe59af1857