Bug 9938 - Mono 2.10.9 crashes in 'GC_set_fl_marks' method
Summary: Mono 2.10.9 crashes in 'GC_set_fl_marks' method
Status: RESOLVED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: GC ()
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
: 1695 ()
Depends on:
Blocks:
 
Reported: 2013-01-31 07:03 UTC by ilya.cherkasov
Modified: 2014-07-22 10:39 UTC (History)
4 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
core dump for mono process (2.66 MB, application/x-gzip)
2013-01-31 07:03 UTC, ilya.cherkasov
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description ilya.cherkasov 2013-01-31 07:03:26 UTC
Created attachment 3315 [details]
core dump for mono process

Good day! We run some production system and everything is okay until mono crash with something that is guessed to be connected with its GC.

We did build mono from sources, which are located here : http://download.mono-project.com/sources/mono/mono-2.10.9.tar.bz2

Command to build mono was: 
$ ../mono-2.10.9/configure --prefix=/opt/mono-2.10.9/ --with-sgen=yes --with-moonlight=no --with-ikvm-native=no --with-libgdiplus=no --disable-llvm --disable-nacl

See bt from gdb below:

(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x0074fdf0 in raise () from core5868-libs/lib/libc.so.6
#2  0x00751701 in abort () from core5868-libs/lib/libc.so.6
#3  0x080e209b in mono_handle_native_sigsegv (signal=11, ctx=0xf7d2fd0c) at ../../../mono-2.10.9/mono/mini/mini-exceptions.c:2223
#4  0x08125680 in mono_arch_handle_altstack_exception (sigctx=0xf7d2fd0c, fault_addr=0x44, stack_ovf=0) at ../../../mono-2.10.9/mono/mini/exceptions-x86.c:1223
#5  0x08059dd4 in mono_sigsegv_signal_handler (_dummy=11, info=0xf7d2fc8c, context=0xf7d2fd0c) at ../../../mono-2.10.9/mono/mini/mini.c:5909
#6  <signal handler called>
#7  0x08221586 in GC_set_fl_marks (q=0x4e8db0 "\240\215N") at ../../mono-2.10.9/libgc/alloc.c:615
#8  0x0822c8d4 in GC_mark_thread_local_free_lists () at ../../mono-2.10.9/libgc/pthread_support.c:669
#9  0x08229423 in GC_push_roots (all=1, cold_gc_frame=0xffe175fc "\204e\"\b\266c\"\b") at ../../mono-2.10.9/libgc/mark_rts.c:622
#10 0x082267c1 in GC_mark_some (cold_gc_frame=0xffe175fc "\204e\"\b\266c\"\b") at ../../mono-2.10.9/libgc/mark.c:326
#11 0x08221367 in GC_stopped_mark (stop_func=0x8220984 <GC_never_stop_func>) at ../../mono-2.10.9/libgc/alloc.c:543
#12 0x08220f78 in GC_try_to_collect_inner (stop_func=0x8220984 <GC_never_stop_func>) at ../../mono-2.10.9/libgc/alloc.c:382
#13 0x08221ead in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0) at ../../mono-2.10.9/libgc/alloc.c:1045
#14 0x08222113 in GC_allocobj (sz=4, kind=4) at ../../mono-2.10.9/libgc/alloc.c:1125
#15 0x08224ea3 in GC_generic_malloc_inner (lb=16, k=4) at ../../mono-2.10.9/libgc/malloc.c:136
#16 0x08225e87 in GC_generic_malloc_many (lb=16, k=4, result=0x82f9270) at ../../mono-2.10.9/libgc/mallocx.c:513
#17 0x0822c566 in GC_local_gcj_malloc (bytes=16, ptr_to_struct_containing_descr=0x8b5aeb4) at ../../mono-2.10.9/libgc/pthread_support.c:450
#18 0x081a5daa in mono_object_new_fast (vtable=0x8b5aeb4) at ../../../mono-2.10.9/mono/metadata/object.c:4345
#19 0xee450330 in ?? ()
#20 0x08b5aeb4 in ?? ()
#21 0x003635cc in ?? ()
#22 0x00000007 in ?? ()
#23 0x00000165 in ?? ()
#24 0x004f1270 in ?? ()
#25 0x08a099e8 in ?? ()
#26 0xffe177d8 in ?? ()
#27 0xee4552a4 in ?? ()
#28 0x004f1258 in ?? ()
#29 0x004f1408 in ?? ()
#30 0x004f1258 in ?? ()
#31 0x00000167 in ?? ()
#32 0xffe177c8 in ?? ()
#33 0xee450304 in ?? ()

Unfortunately, this is very hard to reproduce in lab environment, but the bug threatens production stability (which is heavily loaded), so we would appreciate any help or fix or anything.

Thank you in advance.
Comment 1 ilya.cherkasov 2013-01-31 07:05:39 UTC
We can provide files for solib-absolute-prefix or any details on what's going on in finite mono-code, if you need any.
Comment 2 Sergey Pachkov 2013-01-31 08:28:44 UTC
*** Bug 1695 has been marked as a duplicate of this bug. ***
Comment 3 Sergey Pachkov 2013-02-01 03:43:40 UTC
SEGFAULT

frame 7 in backtrace for thread #1 (LWP 5868)

points to instruction  `mov (%eax),%eax` where $eax contains `68`

this produce address access violation exception.

in C code:
libgc/alloc.c:615 
 set_mark_bit_from_hdr(hhdr, word_no);


hhdr = NULL
word_no = 354

----------------------------

It is where hard to reproduce a problem.

under mono executed a server application handles a requests.

sometimes issue can be reproduced very quickly but sometimes 1000000+ requests is not enough
Comment 4 Sergey Pachkov 2013-02-01 03:48:54 UTC
libgc in mono has version 6.6 

Should we try to use libgc 7.0 ?
Comment 5 Sergey Pachkov 2013-02-01 06:04:46 UTC
libgc has corrupted array GC_threads
p GC_threads
 {0x0 <repeats 16 times>, 0x6fa990, 0x0 <repeats 79 times>, 0x82f9040, 0x0 <repeats 31 times>

on current GC_threadss[16] = 0x6fa990 struct GC_thread has corrupted values in fields

 p GC_threads[16][0]
 {next = 0x6facc0, id = 3865697168, stop_info = {signal = 0, last_stop_count = 268134, stack_ptr =
    0xe669c840 ""}, flags = 2, thread_blocked = 0, stack_end =
    0xe669e000 <Address 0xe669e000 out of bounds>, status = 0x0, ptrfree_freelists = {0x82f6b34 "4k/\b\030",
    0x1 <Address 0x1 out of bounds>, 0xc1 <Address 0xc1 out of bounds>, 0xe4db0948 "0\t\333\344:",


p GC_threads[96]->id contains id of crashed threadd
Comment 6 Sergey Pachkov 2013-02-01 08:11:27 UTC
parallel mark threads was enabled by default
Comment 7 Rodrigo Kumpera 2013-05-23 12:15:10 UTC
Please attach a test case. Don't use libgc 7.0 as we have custom extensions to 6.6 that don't exist on 7.0
Comment 8 Sergey Pachkov 2013-05-24 06:33:53 UTC
generic description for used software
1)mono runs a server with multiple domains
2)each domain contains a thread pool where requests are processed.

The test case is unspecified and an issue depends likely on high load.

We works to reproduce issue again.
Comment 9 ilya.cherkasov 2014-07-22 10:39:16 UTC
presumably fixed in latest version (3.6.1)