Bug 9153 - glibc detected mono: corrupted double-linked list: 0x00007f58a9119fe0
Summary: glibc detected mono: corrupted double-linked list: 0x00007f58a9119fe0
Status: RESOLVED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: GC ()
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2012-12-28 07:42 UTC by Keiichi Iguchi
Modified: 2017-07-14 23:43 UTC (History)
5 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Apache error log (52.38 KB, text/plain)
2012-12-28 07:42 UTC, Keiichi Iguchi
Details
Potentially having a related problem (365.74 KB, application/octet-stream)
2013-10-23 14:29 UTC, Alfred Hall
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Keiichi Iguchi 2012-12-28 07:42:41 UTC
Created attachment 3139 [details]
Apache error log

Description of Problem:
Mono(mod_mono on apache) crushed with following error dump.

Steps to reproduce the problem:
1. Run xsp with mod_mono long time.


Actual Results:
Following dump.

Expected Results:
Run.

How often does this happen? 
3 times per month. (Xsp processed 1 million requests in the meantime.)

Additional Information:

I attached dumped log.
Comment 1 Zoltan Varga 2013-01-04 20:41:53 UTC
-> mod_mono for now.
Comment 2 Alfred Hall 2013-10-23 14:29:16 UTC
Created attachment 5217 [details]
Potentially having a related problem

No idea if this is related to what I'm seeing or not but the stacktrace bears many similarities.

I'm running mono 3.2.3 on Debian wheezy amd64. Startup flags:

MONO_GC_PARAMS="major=marksweep-fixed,major-heap-size=1g" MONO_THREADS_PER_CPU=100 /opt/ahall-mono/bin/mono --server Nitogram.Api.exe runserver
AppHost Created at 20/10/2013 13:35:01, listening on http://localhost:8184/.

This is basically running the NancyFX Web framework in self hosted mode using HttpListener. Increasing the threads per CPU proved necessary or it would lockup under heavy test very quickly. Performance is good with this setup but it seems to slowly start leaking a bit of memory. NOTE that this occurs when running a heavy test using the datastax cassandra Linq driver to create users at around 500 requests a second. It took around an hour of hammering from multiple boxes at a fast rate to get it into this state.

This may not be related, but thought it would be worth looking at this stack trace anyway. Happy to carry out further tests if it helps.
Comment 3 Alfred Hall 2013-10-25 06:19:02 UTC
I seem to be unable to reproduce it with boehm. Any pointers on next steps to help getting further info on what exactly is going on?
Comment 4 Zoltan Varga 2013-10-25 09:47:10 UTC
The assertion signals memory corruption, which can be caused by anything. Try running with a mono runtime compiled with debug info, to get more meaningful stacktraces.
Comment 5 Alfred Hall 2013-10-25 12:53:39 UTC
Increased the nursery to 32 megabytes and hammered it literally for 2 hours and get:

(gdb) bt
#0  0x00007faafe513475 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007faafe5166f0 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00000000004ad74c in mono_handle_native_sigsegv (signal=signal@entry=11, ctx=ctx@entry=0x7faab774e880) at mini-exceptions.c:2380
#3  0x000000000042283f in mono_sigsegv_signal_handler (_dummy=11, info=0x7faab774e9b0, context=0x7faab774e880) at mini.c:6557
#4  <signal handler called>
#5  major_copy_or_mark_object (ptr=ptr@entry=0x7faabb4f4698, obj=0x4000, queue=queue@entry=0x7faaff32f130) at sgen-marksweep.c:1251
#6  0x00000000005de921 in major_scan_object (start=<optimized out>, queue=0x7faaff32f130) at sgen-scan-object.h:78
#7  0x00000000005cbc34 in sgen_drain_gray_stack (max_objs=max_objs@entry=32, ctx=...) at sgen-gc.c:1203
#8  0x00000000005e4aef in workers_thread_func (data_untyped=0x7faaff32f120) at sgen-workers.c:398
#9  0x00007faafe871b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007faafe5bba7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x0000000000000000 in ?? ()


Switched over to parallel fixed mark and sweep collector.

The smaller the nursery is the quicker it crashes, perhaps starts freeing up something that's in use?
Comment 6 Zoltan Varga 2013-10-25 13:21:11 UTC
Try running without any MONO_GC_PARAMS flags, to use the default gc configuration.
Comment 7 Alfred Hall 2013-10-25 13:29:10 UTC
Took only 10 seconds under very heavy load this time.

Running with: MONO_THREADS_PER_CPU=100 /opt/ahall-mono/bin/mono-sgen TestNancy.exe.
If I don't increase the threads it will just lock up completely.

(gdb) bt
#0  0x00007f8aaaa26475 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f8aaaa296f0 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00000000004ad74c in mono_handle_native_sigsegv (signal=signal@entry=11, ctx=ctx@entry=0x7f8aa122cc40) at mini-exceptions.c:2380
#3  0x0000000000503d8f in mono_arch_handle_altstack_exception (sigctx=sigctx@entry=0x7f8aa122cc40, fault_addr=<optimized out>, stack_ovf=stack_ovf@entry=0) at exceptions-amd64.c:921
#4  0x00000000004227b7 in mono_sigsegv_signal_handler (_dummy=11, info=0x7f8aa122cd70, context=0x7f8aa122cc40) at mini.c:6591
#5  <signal handler called>
#6  sgen_par_object_get_size (o=0x745928, vtable=<error reading variable: Cannot access memory at address 0x2185c0000001c>) at ../../mono/metadata/sgen-gc.h:769
#7  sgen_safe_object_get_size (obj=0x745928) at ../../mono/metadata/sgen-gc.h:801
#8  major_copy_or_mark_object (ptr=ptr@entry=0x7f8aa9d449a0, obj=0x745928, queue=queue@entry=0x96a200) at sgen-marksweep.c:1391
#9  0x00000000005d5c1c in major_scan_object (start=<optimized out>, queue=0x96a200) at sgen-scan-object.h:78
#10 0x00000000005cbc77 in sgen_drain_gray_stack (max_objs=max_objs@entry=-1, ctx=...) at sgen-gc.c:1192
#11 0x00000000005cbe7f in precisely_scan_objects_from (ctx=..., desc=1, start_root=0x7f8aa43e7f30, end_root=<optimized out>, n_start=<optimized out>, n_end=<optimized out>) at sgen-gc.c:1599
#12 scan_from_registered_roots (ctx=..., root_type=<optimized out>, addr_start=<optimized out>, addr_end=<optimized out>) at sgen-gc.c:2036
#13 job_scan_from_registered_roots (worker_data=<optimized out>, job_data_untyped=0x7f8aa085c628) at sgen-gc.c:2325
#14 0x00000000005cd7da in major_copy_or_mark_from_roots (old_next_pin_slot=0x7f8a717fd9ec, finish_up_concurrent_mark=0, scan_mod_union=0) at sgen-gc.c:2980
#15 0x00000000005ce3e4 in major_do_collection (reason=<optimized out>) at sgen-gc.c:3277
#16 major_do_collection (reason=0x6fd121 "LOS overflow") at sgen-gc.c:3260
#17 0x00000000005d1de7 in sgen_perform_collection (requested_size=16416, generation_to_collect=1, reason=0x6fd121 "LOS overflow", wait_to_finish=<optimized out>) at sgen-gc.c:3461
#18 0x00000000005e04e3 in sgen_los_alloc_large_inner (vtable=vtable@entry=vtable(0x2753388), size=size@entry=16416) at sgen-los.c:346
#19 0x00000000005e7a53 in mono_gc_alloc_obj_nolock (vtable=vtable@entry=vtable(0x2753388), size=size@entry=16416) at sgen-alloc.c:204
#20 0x00000000005e7ecb in mono_gc_alloc_vector (vtable=vtable(0x2753388), size=16416, max_length=16384) at sgen-alloc.c:491
#21 0x0000000040a4b2f0 in ?? ()
#22 0x00000000031416d0 in ?? ()
#23 0x00000000416c3be0 in ?? ()
#24 0x00007f8aa9c9a8c0 in ?? ()
#25 0x00007f8aa9c9a970 in ?? ()
#26 0x00007f8aa9c9a898 in ?? ()
#27 0x00007f8a717fdc40 in ?? ()
#28 0x00007f8a717fdb90 in ?? ()
#29 0x0000000000000000 in ?? ()
Comment 8 Zoltan Varga 2013-10-25 13:46:02 UTC
This is a completely different crash than in the original report.
Comment 9 Alfred Hall 2013-10-25 13:46:56 UTC
Should i remove my comments here and open another issue?
Comment 10 Zoltan Varga 2013-10-25 14:50:34 UTC
That would be better. Also, if you can compile mono yourself, you might try compiling from git master since we fixed a few sgen problems since 3.2.3.
Comment 11 Alfred Hall 2013-10-25 16:20:02 UTC
Thanks Zoltan, I've reproduced this against master and raised #15716.
Comment 12 Rodrigo Kumpera 2017-07-14 23:43:57 UTC
Sgen received a lot of fixing in the past 4 years. Please try a recent version such as 5.2 and reopen this bug if it still happening.