Bug 1638 - sgen deadlocks when ThreadPool is used
Summary: sgen deadlocks when ThreadPool is used
Status: RESOLVED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: GC ()
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Gonzalo Paniagua Javier
URL:
Depends on:
Blocks:
 
Reported: 2011-10-21 12:39 UTC by Marek Safar
Modified: 2012-01-10 16:41 UTC (History)
4 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Marek Safar 2011-10-21 12:39:43 UTC
using System;

using System.Threading;



class MyContext

{

	public static void Main ()

	{

		int counter = 0;

		

		for (int i = 0; i < 10000000; ++i) {

			ThreadPool.QueueUserWorkItem (delegate { Interlocked.Increment (ref counter); });

		}

		

		SpinWait.SpinUntil (() => counter == 10000000);

	}

}


It never completes with sgen
Comment 1 Marek Safar 2011-11-15 17:57:39 UTC
(gdb) t a a bt

Thread 4 (Thread 0x7ffff490b700 (LWP 19399)):
#0  0x00007ffff753af91 in sem_timedwait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000005e46d5 in mono_sem_timedwait (sem=0x931a68, timeout_ms=<value optimised out>, alertable=1) at mono-semaphore.c:76
#2  0x00000000005502ef in async_invoke_thread (data=0x0) at threadpool.c:1486
#3  0x00000000005b0056 in start_wrapper_internal (data=0xae3fa0) at threads.c:571
#4  start_wrapper (data=0xae3fa0) at threads.c:619
#5  0x00000000005d8223 in thread_start_routine (args=0xa41ef0) at wthreads.c:290
#6  0x00000000005e9529 in inner_start_thread (arg=0xae8a80) at mono-threads-posix.c:49
#7  0x00007ffff7533d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#8  0x00007ffff727f04d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#9  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7ffff7e6f700 (LWP 19398)):
#0  0x00007ffff71cd084 in sigsuspend () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000000000524d5c in suspend_thread (info=0xac4dd0, context=0x7ffff7e6e740) at sgen-os-posix.c:114
#2  0x0000000000524e75 in suspend_handler (sig=<value optimised out>, siginfo=<value optimised out>, context=0x7ffff7e6e740)
    at sgen-os-posix.c:132
#3  <signal handler called>
#4  0x00007ffff753c4bd in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00000000005d7597 in SleepEx (ms=<value optimised out>, alertable=1) at wthreads.c:865
#6  0x000000000054dd98 in monitor_thread (unused=<value optimised out>) at threadpool.c:778
#7  0x00000000005b0056 in start_wrapper_internal (data=0xa7eeb0) at threads.c:571
#8  start_wrapper (data=0xa7eeb0) at threads.c:619
#9  0x00000000005d8223 in thread_start_routine (args=0xa41e28) at wthreads.c:290
#10 0x00000000005e9529 in inner_start_thread (arg=0xae8bb0) at mono-threads-posix.c:49
#11 0x00007ffff7533d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007ffff727f04d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#13 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7ffff4b0c700 (LWP 19397)):
#0  0x00007ffff71cd084 in sigsuspend () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000000000524d5c in suspend_thread (info=0xa60cf0, context=0x7ffff4b0b780) at sgen-os-posix.c:114
#2  0x0000000000524e75 in suspend_handler (sig=<value optimised out>, siginfo=<value optimised out>, context=0x7ffff4b0b780)
    at sgen-os-posix.c:132
#3  <signal handler called>
#4  0x00007ffff753ae9e in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#5  0x00000000005e45d8 in mono_sem_wait (sem=0x930ea0, alertable=1) at mono-semaphore.c:113
#6  0x00000000005005a5 in finalizer_thread (unused=<value optimised out>) at gc.c:1073
#7  0x00000000005b0056 in start_wrapper_internal (data=0xa62980) at threads.c:571
#8  start_wrapper (data=0xa62980) at threads.c:619
#9  0x00000000005d8223 in thread_start_routine (args=0xa41d60) at wthreads.c:290
#10 0x00000000005e9529 in inner_start_thread (arg=0xa5f0c0) at mono-threads-posix.c:49
#11 0x00007ffff7533d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007ffff727f04d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#13 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7ffff7fd47c0 (LWP 19394)):
#0  0x00007ffff753aea0 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000005e45d8 in mono_sem_wait (sem=0x9313c0, alertable=0) at mono-semaphore.c:113
#2  0x0000000000524efc in mono_sgen_wait_for_suspend_ack (count=3) at sgen-os-posix.c:179
#3  0x0000000000525090 in mono_sgen_thread_handshake (suspend=1) at sgen-os-posix.c:228
#4  0x000000000058a4d3 in stop_world (generation=0) at sgen-gc.c:4730
#5  0x00000000005944e5 in minor_collect_or_expand_inner (size=4096) at sgen-gc.c:3632
#6  0x0000000000594aaf in mono_gc_alloc_obj_nolock (vtable=vtable("System.Threading.WaitCallback"), size=<value optimised out>)
    at sgen-gc.c:3883
#7  0x0000000000594c6e in mono_gc_alloc_obj (vtable=vtable("System.Threading.WaitCallback"), size=104) at sgen-gc.c:4034
#8  0x0000000040018500 in ?? ()
#9  0x000000000096bf20 in ?? ()
#10 0x0000000000000000 in ?? ()
Comment 2 Rodrigo Kumpera 2011-11-23 13:28:04 UTC
Gonzalo,

The TP does everything wrong with the above test. It spins the max amount of threads but all tasks are CPU bound.

It does not limit how much stuff can be queued without blocking the producer, which results in huge heaps slowing us down even further.

Sgen crawls on this test since queueing is much faster than on boehm.
Comment 3 Marek Safar 2011-11-24 07:08:13 UTC
Just to clarify, this test deadlocks (0% cpu and no memory allocation) on my 4cores+4ht
Comment 4 Miguel de Icaza [MSFT] 2011-12-08 09:47:16 UTC
This seems to be ignoring the maximum number of threads, I believe we did this, because in Mono we use too many ThreadPool threads for internal uses, so we decided to remove the limit.

Perhaps we need to bring a limit back that is User-Defined-Maximum multiplied by some number, say 4 to account for our other internal uses and perhaps limit the rate of creation after we have too many threads per CPU running.

The test completes for me (after a long time) on both SGen and Boehm on OSX with Mono 2.10.6
Comment 5 Gonzalo Paniagua Javier 2011-12-08 13:45:38 UTC
I can't reproduce this using either master or mono-2-10. BUT, I am running the test on 32 bits.

Maximum number of threadpool threads (for 30000000 items) was 4.

Perhaps it is a 64 bits problem?
Comment 6 Rodrigo Kumpera 2011-12-08 14:39:45 UTC
I can repro this on linux 32bits and osx with trunk. It's bad specially with sgen.

Memory usage skyrockets, when, in fact, it should be constant once we're done firing up threads.
Comment 7 Gonzalo Paniagua Javier 2011-12-08 15:02:13 UTC
Memory use is a different issue from the original reported "it never completes using sgen".

On OSX max # of threads I saw was 67 before it finished. Ie, the original problem does not happen any more
Comment 8 Marek Safar 2011-12-08 18:17:33 UTC
I could not reproduce it recently but I could reproduce it at Xummit when lupus saw it and the dump is from his gdb command :-)
Comment 9 Gonzalo Paniagua Javier 2012-01-10 16:41:54 UTC
Looks like this is gone!

Boehm takes 20s vs 17s sgen. Max threads in the threadpool 2 (2 cores). CPU use is ~87% with boehm and ~96% with sgen.