Bug 11931 - Crash in threadpool_append_jobs
Summary: Crash in threadpool_append_jobs
Status: RESOLVED NORESPONSE
Alias: None
Product: Runtime
Classification: Mono
Component: io-layer ()
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2013-04-24 07:13 UTC by Roope Kangas
Modified: 2017-07-07 20:59 UTC (History)
5 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED NORESPONSE

Description Roope Kangas 2013-04-24 07:13:21 UTC
While load testing our game server with latest stable (Mono JIT compiler version 2.10.11 (mono-2-10/a0bd2f0 Thu Feb 21 07:50:42 UTC 2013)) built from github 2-10 branch. On a 64bit linux.

I came across this crash:


Program received signal SIGSEGV, Segmentation fault.
Stacktrace:


Native stacktrace:

        /opt/mono-2.10/bin/mono-sgen() [0x49390e]
        /opt/mono-2.10/bin/mono-sgen() [0x4e7b0f]
        /opt/mono-2.10/bin/mono-sgen() [0x41be07]
        /lib64/libpthread.so.0(+0xf500) [0x7ffff753f500]
        /opt/mono-2.10/bin/mono-sgen(mono_domain_is_unloading+0) [0x4f0d90]
        /opt/mono-2.10/bin/mono-sgen() [0x5aebb9]
        /opt/mono-2.10/bin/mono-sgen() [0x5af77d]
        /opt/mono-2.10/bin/mono-sgen() [0x5b7bd1]
        /opt/mono-2.10/bin/mono-sgen() [0x5e2029]
        /opt/mono-2.10/bin/mono-sgen() [0x58cdbd]
        /lib64/libpthread.so.0(+0x7851) [0x7ffff7537851]
        /lib64/libc.so.6(clone+0x6d) [0x7ffff728511d]
Detaching after fork from child process 2821.

Debug info from gdb:

ptrace: Operation not permitted.

=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================


Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffc69f4700 (LWP 13584)]
0x00007ffff71cf8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb)
(gdb)
(gdb) bt
#0  0x00007ffff71cf8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff71d1085 in abort () at abort.c:92
#2  0x00000000004939d0 in mono_handle_native_sigsegv (signal=<value optimized out>, ctx=<value optimized out>) at mini-exceptions.c:2290
#3  0x00000000004e7b0f in mono_arch_handle_altstack_exception (sigctx=0x7fffc75ffc40, fault_addr=<value optimized out>, stack_ovf=0) at exceptions-amd64.c:953
#4  0x000000000041be07 in mono_sigsegv_signal_handler (_dummy=11, info=0x7fffc75ffd70, context=0x7fffc75ffc40) at mini.c:5931
#5  <signal handler called>
#6  mono_domain_is_unloading (domain=0x6e00007ffc78e3ca) at appdomain.c:2160
#7  0x00000000005aebb9 in threadpool_append_jobs (tp=0x90c0c0, jobs=0x7fff88004310, njobs=219) at threadpool.c:1073
#8  threadpool_append_jobs (tp=0x90c0c0, jobs=0x7fff88004310, njobs=219) at threadpool.c:1051
#9  0x00000000005af77d in tp_poll_wait (p=0x90c040) at ../../mono/metadata/tpool-poll.c:284
#10 0x00000000005b7bd1 in start_wrapper_internal (data=0x2235460) at threads.c:784
#11 start_wrapper (data=0x2235460) at threads.c:832
#12 0x00000000005e2029 in thread_start_routine (args=0xa051b8) at wthreads.c:289
#13 0x000000000058cdbd in gc_start_thread (arg=0x22cdea0) at sgen-gc.c:6300
#14 0x00007ffff7537851 in start_thread (arg=0x7fffc69f4700) at pthread_create.c:301
#15 0x00007ffff728511d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) 

This happened after several hours of running the server with 900+ concurrent users. I have MONO_DISABLE_AIO=1 set to avoid using epoll backend.
Comment 1 Rodrigo Kumpera 2013-08-23 14:26:00 UTC
Please provide a test case.
Comment 2 Roope Kangas 2013-08-26 08:32:37 UTC
Hi!

I think also this bug might have gone away since bug https://bugzilla.xamarin.com/show_bug.cgi?id=10127 was fixed.

At least I have not seen this in a long time. Marking this fixed too, for now ;)
Comment 3 Roope Kangas 2013-10-30 07:22:05 UTC
Hi!

We do not have a test case since we do not know what actually causes this but we keep getting constant crashes. The most useful trace so far has been:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffed659700 (LWP 11205)]
mono_domain_is_unloading (domain=0xc900007ffe9ae162) at appdomain.c:2160
	in appdomain.c
#0  mono_domain_is_unloading (domain=0xc900007ffe9ae162) at appdomain.c:2160
#1  0x00000000005aebb9 in threadpool_append_jobs (tp=0x90c0c0, jobs=0x7fffc0004310, njobs=1) at threadpool.c:1073
#2  threadpool_append_jobs (tp=0x90c0c0, jobs=0x7fffc0004310, njobs=1) at threadpool.c:1051
#3  0x00000000005af77d in tp_poll_wait (p=0x90c040) at ../../mono/metadata/tpool-poll.c:284
#4  0x00000000005b7bd1 in start_wrapper_internal (data=0x1d23d90) at threads.c:784
#5  start_wrapper (data=0x1d23d90) at threads.c:832
#6  0x00000000005e2029 in thread_start_routine (args=0xa081a8) at wthreads.c:289
#7  0x000000000058cdbd in gc_start_thread (arg=0x1e7a090) at sgen-gc.c:6300
#8  0x00007ffff7537851 in start_thread (arg=0x7fffed659700) at pthread_create.c:301
#9  0x00007ffff728511d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

So the domain can be null here? https://github.com/mono/mono/blob/mono-2-10/mono/metadata/appdomain.c#L2160
Comment 4 Rodrigo Kumpera 2013-11-13 16:07:25 UTC
Do you guys use appdomains?

It does look like a race condition in the unload code.
Comment 5 Rodrigo Kumpera 2013-11-13 16:11:58 UTC
Guys, you're using 2.10, please upgrade to the latest 3.2 as it has multiple years worth of fixes.
Comment 6 Ludovic Henry 2017-07-07 20:59:01 UTC
Can you still reproduce this bug with latest version of mono? Please feel free to reopen if that is still the case. Thank you.