Bug 25069 - SGEN causes SIGSEGV after not accounting for offset in pointer to nursery
Summary: SGEN causes SIGSEGV after not accounting for offset in pointer to nursery
Status: RESOLVED NOT_REPRODUCIBLE
Alias: None
Product: Runtime
Classification: Mono
Component: GC ()
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2014-12-04 08:48 UTC by evolvedmicrobe
Modified: 2017-07-12 23:06 UTC (History)
6 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED NOT_REPRODUCIBLE

Description evolvedmicrobe 2014-12-04 08:48:27 UTC
My program is dying frequently after running into all kinds of trouble it seems after dereferencing a pointer from the gray queue to the nursery thinking it is something it is not.  Below is the stack trace and GDB info.


#0  0x00007fe5994059bd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fe599405854 in __sleep (seconds=0, seconds@entry=1) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x0000000000505b2a in mono_handle_native_sigsegv (signal=6, ctx=<optimized out>) at mini-exceptions.c:2284
#3  <signal handler called>
#4  0x00007fe59937abb9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#5  0x00007fe59937dfc8 in __GI_abort () at abort.c:89
#6  0x0000000000687ef9 in monoeg_log_default_handler (log_domain=<optimized out>, log_level=G_LOG_LEVEL_ERROR, message=<optimized out>, 
    unused_data=<optimized out>) at goutput.c:232
#7  0x00000000006880ff in monoeg_g_logv (log_domain=0x0, log_level=G_LOG_LEVEL_ERROR, format=<optimized out>, args=args@entry=0x7fe57cffe038) at goutput.c:113
#8  0x00000000006881a2 in monoeg_g_log (log_domain=log_domain@entry=0x0, log_level=log_level@entry=G_LOG_LEVEL_ERROR, 
    format=format@entry=0x1190131 "no object of size %d\n") at goutput.c:123
#9  0x00000000006302e6 in ms_find_block_obj_size_index (size=<optimized out>) at sgen-marksweep.c:218
#10 0x0000000000631ae8 in alloc_obj (vtable=0x7fe58af85310, size=<optimized out>, pinned=0, has_references=1) at sgen-marksweep.c:501
#11 0x000000000064cc7f in alloc_for_promotion (has_references=1, objsize=2721382008, obj=0x7fe598ec0998 "\020S\370\212\345\177", vtable=0x7fe58af85310)
    at sgen-simple-nursery.c:35
#12 copy_object_no_checks (obj=obj@entry=0x7fe598ec0998, queue=queue@entry=0x1a5f320 <gray_queue>) at sgen-copy-object.h:83
#13 0x000000000064da71 in simple_nursery_serial_copy_object_from_obj (queue=0x1a5f320 <gray_queue>, obj_slot=0x7fe598ffc710) at sgen-minor-copy-object.h:201
#14 simple_nursery_serial_scan_object (start=<optimized out>, desc=<optimized out>, queue=0x1a5f320 <gray_queue>) at sgen-scan-object.h:74
#15 0x000000000062933a in sgen_drain_gray_stack (max_objs=max_objs@entry=-1, ctx=...) at sgen-gc.c:902
#16 0x000000000062de0c in collect_nursery (unpin_queue=unpin_queue@entry=0x0, finish_up_concurrent_mark=finish_up_concurrent_mark@entry=0) at sgen-gc.c:2325
#17 0x000000000062e7c9 in collect_nursery (finish_up_concurrent_mark=0, unpin_queue=0x0) at sgen-gc.c:3213
#18 sgen_perform_collection (requested_size=4096, generation_to_collect=0, reason=0x118fb17 "Nursery full", wait_to_finish=0) at sgen-gc.c:3239
#19 0x000000000064582b in mono_gc_alloc_obj_nolock (vtable=0x260dd58, size=88) at sgen-alloc.c:328
#20 0x0000000000645913 in mono_gc_alloc_obj (vtable=0x260dd58, size=88) at sgen-alloc.c:504


I ran this through GDB and it appears to me that the problem is that the copy_object_no_checks is not accounting for the fact that this obj appears to be a pointer to a list, which is at an offset of +3g relative to the actual object start.

To elaborate on why I think this, here is the run down of the call stack.

Stack Frame #13 - Passes in an object
copy = copy_object_no_checks (obj, queue);
(gdb) p obj
> obj = 0x7fe598ec0998 

Stack Frame #12 - Tries to grab the MonoVTable for this object with the code below, but the result appears to be nonsense.
MonoVTable *vt = ((MonoObject*)obj)->vtable;

(gdb) info locals
> vt = 0x7fe58af85310
> objsize = 2721382008

I can tell by examining the variable for the vtable that it is totally bogus.  Additionally, if I call the debugger function sgen-debug.c:describe_ptr(obj) it gives me a totally different value for the vtable

VTable: 0x7fe570328b98
Class: List`1
Descriptor: a
Descriptor type: 2 (small_bitmap)
Size: 32


Examining the memory, it appears that the debugger function found this value by scanning the nursery and accounting for an offset.

(gdb) x/5g obj-8
> 0x7fe598ec0990: 0x00007fe58d140240      0x00007fe58af85310 <- what copy function thinks is vtable, corresponds to nonsense
> 0x7fe598ec09a0: 0xc00000003fffffff      0xffffffff00000001
> 0x7fe598ec09b0: 0x00007fe570328b98 <- what the debugger function thinks is vtable, corresponds to list class vtable

It seems to me the current behavior can't be right, and the 3 long offset seems right for a 64 bit process, but I am not sure which way it should grow or how offsets are determined when copying.

If it's relevant, the object on the other side of this in the nursery is an Iterator described below.

VTable: 0x7fe5703dd420
Class: <CreateSelectIterator>c__Iterator10`2
Descriptor: 7a
Descriptor type: 2 (small_bitmap)
Size: 64

I am trying to make a test case of this, but the code is called from a large library that does a lot of pinning to pass arrays to native code.  This SIGSEGV happens when it is running in parallel and very frequently doing GCs.
Comment 1 evolvedmicrobe 2014-12-04 08:49:51 UTC
Forgot to mention, was a git build from todays master branch.
Comment 2 Mark Probst 2014-12-04 14:18:45 UTC
Could you check whether the bug is also present in the 3.12 branch?  We just pushed a lot of changes to the GC after 3.12 was branched.
Comment 3 evolvedmicrobe 2014-12-04 17:28:51 UTC
I just ran it with the 3.12 branch and got a SIGSEGV with the same type of error one time, but a repeat trial gave a SIGSEGV with a slightly different error, suggesting the corruption might be elsewhere (but I don't really know).

The version I used is described as follows 


> Mono JIT compiler version 3.12.0 (mono-3.12.0-branch/3016bd7 Thu Dec  4 12:26:05 PST 2014)
> Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
>        TLS:           __thread
>        SIGSEGV:       altstack
>        Notifications: epoll
>        Architecture:  amd64
>        Disabled:      none
>        Misc:          softdebug 
>        LLVM:          supported, not enabled.
>        GC:            sgen


Running it the first time, I got a similar stack trace as in the initial report, but slightly different vtable values from the debugger functions versus the code execution path.

From debugger function:

	VTable: 0x7fe91c06dea0
	Class: Func`2
	Descriptor: 7c4006a
	Descriptor type: 2 (small_bitmap)
	Size: 104

From stack:

	vt = 0x1

The stack trace was similar, throwing an error due to the bad vt variable.

However, the second time the stack trace was quite different, which is below.  It seems to be in some trampoline stuff when it is fired, and I had a very difficult time working through the debugger.  It seems that the SIGSEGV fires either with a strack trace like above, or in one of these mini methods.  As before, lots of signals and GCs seem to be thrown around the runtime when this happens.

I still haven't reduced to small test case, but can easily reproduce with the larger program if there are other things that would be useful to know.

Stack trace error different version.

#0  0x00007f51b27ff9bd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f51b27ff854 in __sleep (seconds=0, seconds@entry=1) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x00000000004b381a in mono_handle_native_sigsegv (signal=signal@entry=11, ctx=ctx@entry=0x7f51aa841c40) at mini-exceptions.c:2269
#3  0x000000000050aa73 in mono_arch_handle_altstack_exception (sigctx=sigctx@entry=0x7f51aa841c40, fault_addr=<optimized out>, stack_ovf=stack_ovf@entry=0) at exceptions-amd64.c:861
#4  0x000000000042ac12 in mono_sigsegv_signal_handler (_dummy=11, _info=0x7f51aa841d70, context=0x7f51aa841c40) at mini.c:6858
#5  <signal handler called>
#6  __GI___pthread_mutex_lock (mutex=0x92) at ../nptl/pthread_mutex_lock.c:66
#7  0x00000000004bd77d in mono_class_fill_runtime_generic_context (class_vtable=0x973480 <vtable>, caller=0x40e35380 "L\213\320H\213\205x\377\377\377H\213\370H\213", slot=0)
    at mini-generic-sharing.c:1922
#8  0x00000000414eeae6 in ?? ()
#9  0x00007f51b20e10b8 in ?? ()
#10 0x00007f51b2108030 in ?? ()
#11 0x00007f51b2107ff0 in ?? ()
#12 0x00007f51b2108030 in ?? ()
#13 0x00007f51b2107fa8 in ?? ()
#14 0x00007f51b20e10b8 in ?? ()
#15 0x00007f51b2107ff0 in ?? ()
#16 0x0000000040e34c3c in ?? ()
#17 0x00007f51b20e10b8 in ?? ()
#18 0x00007f518806c1f8 in ?? ()
#19 0x00007f51b2108030 in ?? ()
#20 0x0000000040e50c54 in ?? ()
#21 0x00007f51b20b84c8 in ?? ()
#22 0x00007f51a78bc780 in ?? ()
#23 0x00007f51a78bce00 in ?? ()
#24 0x00007f51b20b5a30 in ?? ()
#25 0x00007f51b2344820 in ?? ()
#26 0x00007f51a50b5960 in ?? ()
#27 0x00007f51b22d21d8 in ?? ()
#28 0x00007f51b20cf020 in ?? ()
#29 0x00007f51b22e81c8 in ?? ()
#30 0x00007f51b21028f8 in ?? ()
#31 0x00007f51b20b8401 in ?? ()
#32 0x0000000040e4fc01 in ?? ()
#33 0x00007f51b2107c50 in ?? ()
#34 0x00007f51b2107c50 in ?? ()
#35 0x000000000000002d in ?? ()
#36 0x00007f51b2107c70 in ?? ()
#37 0x00007f51b2107dd0 in ?? ()
#38 0x00007f51a78bc708 in ?? ()
#39 0x00007f51a78bc718 in ?? ()
#40 0x0000000000000008 in ?? ()
#41 0x00007f51a42a82e0 in ?? ()
#42 0x00007f51b2108030 in ?? ()
#43 0x00007f51b22e81c8 in ?? ()
#44 0x00007f51b21592e0 in ?? ()
#45 0x00007f51b2016020 in ?? ()
#46 0x00007f51b22e7d10 in ?? ()
#47 0x00007f51b20b84c8 in ?? ()
#48 0x0000000040e3832c in ?? ()
#49 0x00007f5190002650 in ?? ()
#50 0x0000000000000000 in ?? ()
Comment 4 evolvedmicrobe 2014-12-04 18:08:21 UTC
I think this is unrelated but just some more information.  As this code works on the Mac but not on the Ubuntu server, I tried running it with the GC_PARAMS set to both conservative and precise.

The error is thrown with conservative, but with precise it dies quickly with the following message:


	* Assertion at mini-gc.c:2036, condition `cfg->frame_reg == cfg->cfa_reg' not met

	Received SIGSEGV, suspending...

And attaching GDB shows the following:

	(gdb) info threads
	  Id   Target Id         Frame 
	* 1    Thread 0x7f02f0f3c7c0 (LWP 39800) "mono" 0x00007f02f00fa9a0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
	(gdb) bt
	#0  0x00007f02f00fa9a0 in __nanosleep_nocancel ()
	    at ../sysdeps/unix/syscall-template.S:81
	#1  0x00007f02f00fa854 in __sleep (seconds=0, seconds@entry=1)
	    at ../sysdeps/unix/sysv/linux/sleep.c:137
	#2  0x00000000004b381a in mono_handle_native_sigsegv (signal=6, 
	    ctx=<optimized out>) at mini-exceptions.c:2269
	#3  <signal handler called>
	#4  0x00007f02f006fbb9 in __GI_raise (sig=sig@entry=6)
	    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
	#5  0x00007f02f0072fc8 in __GI_abort () at abort.c:89
	#6  0x0000000000627799 in monoeg_log_default_handler (
	    log_domain=<optimized out>, log_level=G_LOG_LEVEL_ERROR, 
	    message=<optimized out>, unused_data=<optimized out>) at goutput.c:232
	#7  0x000000000062799f in monoeg_g_logv (log_domain=log_domain@entry=0x0, 
	    log_level=log_level@entry=G_LOG_LEVEL_ERROR, 
	    format=format@entry=0x630560 "* Assertion at %s:%d, condition `%s' not met\n", args=args@entry=0x7fff38e99b98) at goutput.c:113
	#8  0x0000000000627ae6 in monoeg_assertion_message (
	    format=format@entry=0x630560 "* Assertion at %s:%d, condition `%s' not met\n") at goutput.c:133
	#9  0x00000000004cdea8 in compute_frame_size (cfg=0xf4f3f0) at mini-gc.c:2036
	#10 mini_gc_create_gc_map (cfg=cfg@entry=0xf4f3f0) at mini-gc.c:2443
	---Type <return> to continue, or q <return> to quit---
	#11 0x0000000000429117 in mini_method_compile (method=0xf4cc10, 
	    opts=opts@entry=370239999, domain=domain@entry=0xef9310, 
	    flags=flags@entry=JIT_FLAG_RUN_CCTORS, parts=parts@entry=0) at mini.c:5730
	#12 0x000000000042c4fd in mono_jit_compile_method_inner (
	    jit_ex=0x7fff38e99f68, opt=370239999, target_domain=0xef9310, 
	    method=0xf4cc10) at mini.c:6024
	#13 mono_jit_compile_method_with_opt (method=method@entry=0xf4cc10, 
	    opt=370239999, ex=ex@entry=0x7fff38e99f68) at mini.c:6296
	#14 0x000000000042ce8b in mono_jit_compile_method (method=0xf4cc10)
	    at mini.c:6333
	#15 0x000000000042d49c in mono_jit_runtime_invoke (method=0xf4c9d0, 
	    obj=0x7f02ef800818, params=0x7fff38e9a170, exc=0x0) at mini.c:6662
	#16 0x00000000005aa3dd in mono_runtime_invoke (method=method@entry=0xf4c9d0, 
	    obj=obj@entry=0x7f02ef800818, params=params@entry=0x7fff38e9a170, 
	    exc=exc@entry=0x0) at object.c:2842
	#17 0x0000000000531241 in create_exception_two_strings (klass=0xf4c428, 
	    a1=0x7f02ef8007e0, a2=a2@entry=0x0) at exception.c:141
	#18 0x0000000000531485 in mono_exception_from_name_two_strings (
	    image=<optimized out>, name_space=name_space@entry=0x63186e "System", 
	    name=name@entry=0x643f27 "OutOfMemoryException", a1=<optimized out>, 
	    a2=a2@entry=0x0) at exception.c:164
	#19 0x000000000059c311 in create_domain_objects (domain=domain@entry=0xef9310)
	---Type <return> to continue, or q <return> to quit---
	    at appdomain.c:180
	#20 0x000000000059d3d1 in mono_runtime_init (domain=domain@entry=0xef9310, 
	    start_cb=start_cb@entry=0x4225c0 <mono_thread_start_cb>, 
	    attach_cb=attach_cb@entry=0x422560 <mono_thread_attach_cb>)
	    at appdomain.c:264
	#21 0x000000000042bafc in mini_init (
	    filename=0x7fff38e9b6c1 "PacBio.ConsensusTools.exe", 
	    runtime_version=runtime_version@entry=0x0) at mini.c:7526
	#22 0x0000000000488422 in mono_main (argc=3, argv=<optimized out>)
	    at driver.c:1914
	#23 0x00007f02f005aec5 in __libc_start_main (main=0x420de0 <main>, argc=3, 
	    argv=0x7fff38e9a4a8, init=<optimized out>, fini=<optimized out>, 
	    rtld_fini=<optimized out>, stack_end=0x7fff38e9a498) at libc-start.c:287
	#24 0x0000000000421084 in _start ()
Comment 5 Mark Probst 2014-12-05 16:39:47 UTC
Could you please try the latest master (at least 6c60288739dd5a6606569dc2866226ac91ff594a) with `MONO_GC_DEBUG=nursery-canaries`?  That option adds an 8 byte canary after each object in the nursery, to detect overwrites (which usually happen with unsafe array code).

Precise stack scanning isn't currently working.
Comment 6 evolvedmicrobe 2014-12-05 19:21:55 UTC
With that flag and the latest master (master/b5ed5a3) it fails repeatedly on startup with: 

* Assertion at aot-runtime.c:854, condition `ref->method' not met

Received SIGSEGV, suspending...


> (gdb) info threads
>  Id   Target Id         Frame 
>  2    Thread 0x7fca6b854700 (LWP 32425) "Finalizer" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
> * 1    Thread 0x7fca6f2fb7c0 (LWP 32424) "mono" 0x00007fca6db619bd in nanosleep () at ../sysdeps/unix/syscall-template.S:81


(gdb) bt
#0  0x00007fca6db619bd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fca6db61854 in __sleep (seconds=0, seconds@entry=1) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x0000000000505b2a in mono_handle_native_sigsegv (signal=6, ctx=<optimized out>) at mini-exceptions.c:2284
#3  <signal handler called>
#4  0x00007fca6dad6bb9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#5  0x00007fca6dad9fc8 in __GI_abort () at abort.c:89
#6  0x0000000000687f19 in monoeg_log_default_handler (log_domain=<optimized out>, log_level=G_LOG_LEVEL_ERROR, message=<optimized out>, unused_data=<optimized out>) at goutput.c:232
#7  0x000000000068811f in monoeg_g_logv (log_domain=log_domain@entry=0x0, log_level=log_level@entry=G_LOG_LEVEL_ERROR, format=format@entry=0x10bfea0 "* Assertion at %s:%d, condition `%s' not met\n", args=args@entry=0x7fff86d71a98)
    at goutput.c:113
#8  0x0000000000688266 in monoeg_assertion_message (format=format@entry=0x10bfea0 "* Assertion at %s:%d, condition `%s' not met\n") at goutput.c:133
#9  0x00000000004f1cfb in decode_method_ref_with_target (module=module@entry=0x2331830, ref=ref@entry=0x7fff86d71c00, target=target@entry=0x0, buf=<optimized out>, endbuf=endbuf@entry=0x7fff86d71bf8) at aot-runtime.c:854
#10 0x00000000004f5094 in decode_method_ref (endbuf=0x7fff86d71bf8, buf=<optimized out>, ref=0x7fff86d71c00, module=0x2331830) at aot-runtime.c:1241
#11 decode_patch (aot_module=aot_module@entry=0x2331830, mp=mp@entry=0x23a2040, ji=ji@entry=0x7fff86d71c70, buf=<optimized out>, endbuf=endbuf@entry=0x7fff86d71c68) at aot-runtime.c:3127
#12 0x00000000004fa7c0 in mono_aot_plt_resolve (aot_module=aot_module@entry=0x2331830, plt_info_offset=<optimized out>, code=code@entry=0x7fca6b9c0ac0 <System.IO.Path:GetInvalidPathChars+48> "H\203\304", <incomplete sequence \303>)
    at aot-runtime.c:4102
#13 0x0000000000506205 in mono_aot_plt_trampoline (regs=<optimized out>, code=0x7fca6b9c0ac0 <System.IO.Path:GetInvalidPathChars+48> "H\203\304", <incomplete sequence \303>, aot_module=0x2331830 "P\367\062\002", tramp=<optimized out>)
    at mini-trampolines.c:872
#14 0x00000000402a8fa6 in ?? ()
#15 0x0000000000000000 in ?? ()
Comment 7 Mark Probst 2014-12-05 19:29:32 UTC
Sorry.  Please also use `-O=-aot`.
Comment 8 evolvedmicrobe 2014-12-05 19:40:09 UTC
Trying that now, as some more information, I seem to be having a lot of difficulty (or am perhaps unable) to recreate the bug if I use the flag:

 MONO_GC_DEBUG=verify-before-collections
Comment 9 evolvedmicrobe 2014-12-05 19:49:19 UTC
The -O=aot command doesn't seem to change the behavior, I am running it as

mono -O=aot myprogram.exe


The stack trace is the same and I am not sure if the option does anything, as running it with or without the -O=aot option appears to produce the same output if I also add the -v flag (last bit before sigsegv shown below).

Method (wrapper managed-to-native) string:GetLOSLimit () emitted at 0x40b8f5f0 to 0x40b8f66e (code length 126) [PacBio.ConsensusTools.exe]
converting method (wrapper managed-to-native) System.IO.MonoIO:get_VolumeSeparatorChar ()
Method (wrapper managed-to-native) System.IO.MonoIO:get_VolumeSeparatorChar () emitted at 0x40b8f680 to 0x40b8f702 (code length 130) [PacBio.ConsensusTools.exe]
converting method (wrapper managed-to-native) System.IO.MonoIO:get_DirectorySeparatorChar ()
Method (wrapper managed-to-native) System.IO.MonoIO:get_DirectorySeparatorChar () emitted at 0x40b8f710 to 0x40b8f792 (code length 130) [PacBio.ConsensusTools.exe]
converting method (wrapper managed-to-native) System.IO.MonoIO:get_AltDirectorySeparatorChar ()
Method (wrapper managed-to-native) System.IO.MonoIO:get_AltDirectorySeparatorChar () emitted at 0x40b8f7a0 to 0x40b8f822 (code length 130) [PacBio.ConsensusTools.exe]
converting method (wrapper managed-to-native) System.IO.MonoIO:get_PathSeparator ()
Method (wrapper managed-to-native) System.IO.MonoIO:get_PathSeparator () emitted at 0x40b8f830 to 0x40b8f8b2 (code length 130) [PacBio.ConsensusTools.exe]
* Assertion at aot-runtime.c:854, condition `ref->method' not met

Received SIGSEGV, suspending...
Comment 10 evolvedmicrobe 2014-12-05 19:56:39 UTC
Ah, ignore last comment about the -O=aot option not working, just saw the "-" there.
Comment 11 evolvedmicrobe 2014-12-05 21:43:20 UTC
Used the latest commit (master/b5ed5a3) and with the canary flag, the memory in the nursery appears to be laid out correctly, but that obj pointer is still wonky and leads to the crash.  I obtained a similar stack trace as before, with the vt pointer = 0.  Here's a breakdown of things I observed

Here the object is at:
> obj = 0x7f3031fad5a8

Examining the object:
> (gdb) call describe_ptr(obj) (@0x7f3031fad5a8)
> nursery-ptr (interior-ptr offset -48)
> VTable: 0x7f30083ad0c0
> Class: <CreateSelectIterator>c__Iterator10`2
> Descriptor: 7a
> Descriptor type: 2 (small_bitmap)
> Size: 64 

And its first neighbor (@0x7f3031fad568)
> nursery-ptr
> VTable: 0x7f30082f8ca8
> Class: Func`2
> Descriptor: 3e22
> Descriptor type: 2 (small_bitmap)
> Size: 104

And its second neighbor (@0x7f3031fad620)
> (gdb) call describe_ptr(obj+120)
> nursery-ptr
> VTable: 0x7f30081e7528
> Class: Enumerator
> Descriptor: 2a
> Descriptor type: 2 (small_bitmap)
> Size: 40

And looking at memory (I am assuming the canary means I should see the "koupepia" bytes in memory after each object [107,111,117,112,101,172,105,97].  The canaries seem to be in roughly the right place, the objects appear to be of the same size, However, the obj* is sitting in what appears to be the middle of other objects.

> (gdb) x/192ub obj-72
> 0x7f3031fad560: 107     111     117     112     101     112     105     97
> 0x7f3031fad568: 168     140     47      8       48      127     0       0  <- BEGIN SIZE 104 Func`2
> 0x7f3031fad570: 0       0       0       0       0       0       0       0
> 0x7f3031fad578: 48      148     215     64      0       0       0       0
> 0x7f3031fad580: 224     148     215     64      0       0       0       0
> 0x7f3031fad588: 64      73      245     49      48      127     0       0
> 0x7f3031fad590: 72      131     49      8       48      127     0       0
> 0x7f3031fad598: 0       0       0       0       0       0       0       0
> 0x7f3031fad5a0: 96      141     47      8       48      127     0       0
> 0x7f3031fad5a8*: 0       0       0       0       0       0       0       0 * this line is the memory location pointed to by obj
> 0x7f3031fad5b0: 0       0       0       0       0       0       0       0
> 0x7f3031fad5b8: 0       0       0       0       0       0       0       0
> 0x7f3031fad5c0: 0       0       0       0       0       0       0       0
> 0x7f3031fad5c8: 0       0       0       0       0       0       0       0 <- END SIZE 104 Func`2
> 0x7f3031fad5d0: 107     111     117     112     101     112     105     97
> *0x7f3031fad5d8*: 192     208     58      8       48      127     0       0 <- BEGIN SIZE 64 <CreateSelectIterator>c__Iterator10`2
> 0x7f3031fad5e0: 0       0       0       0       0       0       0       0
> 0x7f3031fad5e8: 160     66      207     38      48      127     0       0
> 0x7f3031fad5f0: 32      214     250     49      48      127     0       0
> 0x7f3031fad5f8: 160     206     40      39      48      127     0       0
> 0x7f3031fad600: 104     213     250     49      48      127     0       0
> 0x7f3031fad608: 10      0       0       0       59      0       0       0
> 0x7f3031fad610: 1       0       0       0       255     255     255     255 <- END OF SIZE 64 <CreateSelectIterator>c__Iterator10`2
> 0x7f3031fad618: 107     111     117     112     101     112     105     97
> 0x7f3031fad620: <- BEGIN size 40 enumerator


Note that obj points to inside the Func`2, but describe_ptr assigns it to the CreateSelectIterator.

Also shown below is the address space as longs, with the vtable pointers surrounded by **


> (gdb) x/20ag 0x7f3031fad568
> 0x7f3031fad568: *0x7f30082f8ca8*  0x0
> 0x7f3031fad578: 0x40d79430      0x40d794e0
> 0x7f3031fad588: 0x7f3031f54940  0x7f3008318348
> 0x7f3031fad598: 0x0     0x7f30082f8d60
> 0x7f3031fad5a8: 0x0     0x0
> 0x7f3031fad5b8: 0x0     0x0
> 0x7f3031fad5c8: 0x0     0x6169706570756f6b
> 0x7f3031fad5d8: *0x7f30083ad0c0*  0x0
> 0x7f3031fad5e8: 0x7f3026cf42a0  0x7f3031fad620
> 0x7f3031fad5f8: 0x7f302728cea0  0x7f3031fad568
> 0x7f3031fad608: 0x3b0000000a    0xffffffff00000001
> 0x7f3031fad618: 0x6169706570756f6b      *0x7f30081e7528*


And the stack trace 

> #0  0x00007f30325f39bd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
> #1  0x00007f30325f3854 in __sleep (seconds=0, seconds@entry=1) at ../sysdeps/unix/sysv/linux/sleep.c:137
> #2  0x0000000000505b2a in mono_handle_native_sigsegv (signal=signal@entry=11, ctx=ctx@entry=0x7f303205fc40) at mini-exceptions.c:2284
> #3  0x00000000005671d3 in mono_arch_handle_altstack_exception (sigctx=sigctx@entry=0x7f303205fc40, fault_addr=<optimized out>, stack_ovf=stack_ovf@entry=0) at exceptions-amd64.c:849
> #4  0x0000000000478172 in mono_sigsegv_signal_handler (_dummy=11, _info=0x7f303205fd70, context=0x7f303205fc40) at mini.c:6718
> #5  <signal handler called>
> #6  copy_object_no_checks (obj=obj@entry=0x7f3031fad5a8, queue=queue@entry=0x1a5f320 <gray_queue>) at sgen-copy-object.h:80
> #7  0x000000000064dd60 in simple_nursery_serial_copy_object_from_obj (queue=0x1a5f320 <gray_queue>, obj_slot=0x7f3031d38130) at sgen-minor-copy-object.h:201
> #8  simple_nursery_serial_scan_object (start=<optimized out>, desc=<optimized out>, queue=0x1a5f320 <gray_queue>) at sgen-scan-object.h:67
> #9  0x000000000062935a in sgen_drain_gray_stack (max_objs=max_objs@entry=-1, ctx=...) at sgen-gc.c:902
> #10 0x000000000062de2c in collect_nursery (unpin_queue=unpin_queue@entry=0x0, finish_up_concurrent_mark=finish_up_concurrent_mark@entry=0) at sgen-gc.c:2330
> #11 0x000000000062e7e9 in collect_nursery (finish_up_concurrent_mark=0, unpin_queue=0x0) at sgen-gc.c:3218
> #12 sgen_perform_collection (requested_size=4096, generation_to_collect=0, reason=0x118fb37 "Nursery full", wait_to_finish=0) at sgen-gc.c:3244
> #13 0x000000000064584b in mono_gc_alloc_obj_nolock (vtable=0x7f3008205ab8, size=112) at sgen-alloc.c:328
> #14 0x0000000000645933 in mono_gc_alloc_obj (vtable=0x7f3008205ab8, size=104) at sgen-alloc.c:504
> #15 0x0000000040c9cfad in ?? ()
> #16 0x00007f3031fa3c68 in ?? ()
> #17 0x00007f3031f45378 in ?? ()
> #18 0x00007f3031fbbc48 in ?? ()
> #19 0x00000000000000a5 in ?? ()
> #20 0x00007f3031c26158 in ?? ()
> #21 0x0000000041ce3000 in ?? ()
> #22 0x00007f300c000bd0 in ?? ()
> #23 0x00007f3025337020 in ?? ()
> #24 0x00007f30264fe370 in ?? ()
> #25 0x00007f30264fe180 in ?? ()
> #26 0x00007f30264fe370 in ?? ()
> #27 0x0000000040d607f8 in ?? ()
> #28 0x00007f3031f45378 in ?? ()
> #29 0x00007f3031f9c918 in ?? ()
> #30 0x00007f3031f45378 in ?? ()
> #31 0x00007f3031fc64c8 in ?? ()
> #32 0x00007f3031f45378 in ?? ()
> #33 0x00007f3031fc77e0 in ?? ()
> #34 0x00007f3031f45378 in ?? ()
> #35 0x00007f3031fae540 in ?? ()
> #36 0x00007f3031f45378 in ?? ()
> #37 0x00007f3026eb1020 in ?? ()
> #38 0x00007f3031c26940 in ?? ()
> #39 0x00007f3031fa3c68 in ?? ()
> #40 0x0000000000000000 in ?? ()
Comment 12 evolvedmicrobe 2014-12-05 22:50:10 UTC
After additional testing, it seems setting

   MONO_GC_DEBUG=verify-before-collections

means the bug takes longer to appear, but can still happen.
Comment 13 Mark Probst 2014-12-08 14:37:09 UTC
Here are things you can try:

`mono_gc_scan_for_specific_ref()` scans the heap and roots for a given pointer.  That should give you a hint as to where that stray pointer comes from.

You can enable `SGEN_HEAVY_BINARY_PROTOCOL` in `sgen-conf.h` and use `MONO_GC_DEBUG=binary-protocol=/tmp/binprot` to make SGen write a fine-grained protocol of its work to a file.  It'll slow everything down, and the file grow large fast, but if you can still replicate the bug then this is the best tool for finding the bug.  Use `sgen-grep-binprot` in `tools/sgen/` to search the protocol for one or more pointers.  Before you search the file you should do

    p binary_protocol_flush_buffers(1)

in GDB.
Comment 14 evolvedmicrobe 2014-12-08 20:27:38 UTC
Thanks Mark,

I may be having some success with mono_gc_scan_for_specific_ref.  It tells me it is at:

> found possible ref to 0x7fc5c6f018d8 in object 0x7fc5b8cda560 (Tuple`2) at offset 16
> Pointer is the start of object 0x7fc5b8cda560 in oldspace.
> major-ptr (block 0x7fc5b8cd8000 sz 32 pin 0 ref 1)> 

>         (
> object
>  marked 0)> 

> VTable: 0x7fc59c022d70
> Class: Tuple`2
> Descriptor: a
> Descriptor type: 2 (small_bitmap)
> Size: 32


But since Tuple`2 is a generic type, I am a bit stuck here.  Do you know how I can print type information for this generic (e.g. Tuple`2 -> Tuple<int,float>) or some such thing?
Comment 15 Mark Probst 2014-12-08 20:31:51 UTC
p mono_type_full_name (&((MonoObject*)ptr)->vtable->class->byval_arg)
Comment 16 evolvedmicrobe 2014-12-08 21:29:17 UTC
Thanks!  

Moving backwards through the pointers, one thing that struck me as perhaps unusual, is that for the original ptr leading to the sigsegv, a call to mono_gc_scan_for_specific_ref prints the same possible reference for the object twice, whereas every other time I called it with a different pointer, it only printed each possible reference once.  Or more exactly, the first call leads to:

> (gdb) p mono_gc_scan_for_specific_ref(0x7fc5c6f018d8,0)
> found possible ref to 0x7fc5c6f018d8 in object 0x7fc5b8cda560 (Tuple`2) at offset 16
> found possible ref to 0x7fc5c6f018d8 in object 0x7fc5b8cda560 (Tuple`2) at offset 16

I am not sure if there is any significance to the double printing, but thought it might mean something, so am mentioning it.

Otherwise, backtracing further, it looks like it is originally referenced from a tuple that contains an interface type (renamed IT here for simplicity) and this in turn goes up to several PLINK tasks/action arrays.  

found possible ref to 0x7fc5c6f018d8 in object 0x7fc5b8cda560 (Tuple`2) at offset 16  - Referenced twice here
found possible ref to 0x7fc5c6f018d8 in object 0x7fc5b8cda560 (Tuple`2) at offset 16 "System.Tuple`2<PacBio.IO.IZmwBases, bool>"

Moving Backwards 
> found possible ref to 0x7fc5c6f018d8 in object 0x7fc5b8cda560 (System.Tuple`2<IT, bool>) at offset 16 <- only showing once, line appeared twice
> found possible ref to 0x7fc5b8cda560 in object 0x7fc5bc07d540 (System.Tuple`2<IT, bool>[]) at offset 120
> found possible ref to 0x7fc5bc07d540 in object 0x7fc5bfbaee40 (System.Linq.Parallel.AsynchronousChannel`1<System.Tuple`2<IT, bool>>) at offset 24 
> found possible ref to 0x7fc5bfbaee40 in object 0x7fc5bfb94410 (System.Linq.Parallel.PipelineSpoolingTask`2<System.Tuple`2<IT, bool>, int>) at offset 40
	> found possible ref to 0x7fc5bfb94410 in object 0x7fc5bcddbe80 (Task) at offset 72
	> found possible ref to 0x7fc5bcddbe80 in object 0x7fc5bc044488 (AsyncResult) at offset 16
	> *Nothing printed by function call for  0x7fc5bc044488 *
> found possible ref to 0x7fc5bfbaee40 in object 0x7fc5ba2c7e40 (AsynchronousChannel`1[]) at offset 40
	 > found possible ref to 0x7fc5ba2c7e40 in object 0x7fc5bfb94290 (System.Linq.Parallel.SpoolingTask/<SpoolPipeline>c__AnonStorey1`2<System.Tuple`2<PacBio.IO.IZmwBases, bool>, int>)
	 > found possible ref to 0x7fc5bfb94290 in object 0x7fc5bfbaed50 (Action) at offset 32
	 > found possible ref to 0x7fc5bfbaed50 in object 0x7fc5bfb88200 (ActionInvoke) at offset 16
	 > found possible ref to 0x7fc5bfb88200 in object 0x7fc5bcddbd40 (Task) at offset 64
	 > found possible ref to 0x7fc5bcddbd40 in object 0x7fc5bc060550 (QueryTaskGroupState) at offset 16
	 	> found possible ref to 0x7fc5bc060550 in object 0x7fc5c6f485d8 (AsynchronousChannelMergeEnumerator`1) at offset 16
	 	> found possible ref to 0x7fc5bc060550 in object 0x7fc5c6f51860 (PipelineSpoolingTask`2) at offset 16
	 	> found possible ref to 0x7fc5bc060550 in object 0x7fc5bfb94290 (<SpoolPipeline>c__AnonStorey1`2) at offset 24
	 	> found possible ref to 0x7fc5bc060550 in object 0x7fc5bfb943e0 (PipelineSpoolingTask`2) at offset 16
	 	> found possible ref to 0x7fc5bc060550 in object 0x7fc5bfb94410 (PipelineSpoolingTask`2) at offset 16
	 	> found possible ref to 0x7fc5bc060550 in object 0x7fc5bfb94440 (PipelineSpoolingTask`2) at offset 16

I didn't follow anymore after this point, I will try again to see if this looks like a pattern, but this is the result so far.
Comment 17 evolvedmicrobe 2014-12-08 21:30:56 UTC
Reprinting that backwards trace in the hopes of better formatting on bugzilla...

> > found possible ref to 0x7fc5c6f018d8 in object 0x7fc5b8cda560 (System.Tuple`2<IT, bool>) at offset 16 <- only showing once, line appeared twice
> > found possible ref to 0x7fc5b8cda560 in object 0x7fc5bc07d540 (System.Tuple`2<IT, bool>[]) at offset 120
> > found possible ref to 0x7fc5bc07d540 in object 0x7fc5bfbaee40 (System.Linq.Parallel.AsynchronousChannel`1<System.Tuple`2<IT, bool>>) at offset 24 
> > found possible ref to 0x7fc5bfbaee40 in object 0x7fc5bfb94410 (System.Linq.Parallel.PipelineSpoolingTask`2<System.Tuple`2<IT, bool>, int>) at offset 40
> 	> found possible ref to 0x7fc5bfb94410 in object 0x7fc5bcddbe80 (Task) at offset 72
> 	> found possible ref to 0x7fc5bcddbe80 in object 0x7fc5bc044488 (AsyncResult) at offset 16
> 	> *Nothing printed by function call for  0x7fc5bc044488 *
> > found possible ref to 0x7fc5bfbaee40 in object 0x7fc5ba2c7e40 (AsynchronousChannel`1[]) at offset 40
> 	 > found possible ref to 0x7fc5ba2c7e40 in object 0x7fc5bfb94290 (System.Linq.Parallel.SpoolingTask/<SpoolPipeline>c__AnonStorey1`2<System.Tuple`2<PacBio.IO.IZmwBases, bool>, int>)
> 	 > found possible ref to 0x7fc5bfb94290 in object 0x7fc5bfbaed50 (Action) at offset 32
> 	 > found possible ref to 0x7fc5bfbaed50 in object 0x7fc5bfb88200 (ActionInvoke) at offset 16
> 	 > found possible ref to 0x7fc5bfb88200 in object 0x7fc5bcddbd40 (Task) at offset 64
> 	 > found possible ref to 0x7fc5bcddbd40 in object 0x7fc5bc060550 (QueryTaskGroupState) at offset 16
> 	 	> found possible ref to 0x7fc5bc060550 in object 0x7fc5c6f485d8 (AsynchronousChannelMergeEnumerator`1) at offset 16
> 	 	> found possible ref to 0x7fc5bc060550 in object 0x7fc5c6f51860 (PipelineSpoolingTask`2) at offset 16
> 	 	> found possible ref to 0x7fc5bc060550 in object 0x7fc5bfb94290 (<SpoolPipeline>c__AnonStorey1`2) at offset 24
> 	 	> found possible ref to 0x7fc5bc060550 in object 0x7fc5bfb943e0 (PipelineSpoolingTask`2) at offset 16
> 	 	> found possible ref to 0x7fc5bc060550 in object 0x7fc5bfb94410 (PipelineSpoolingTask`2) at offset 16
> 	 	> found possible ref to 0x7fc5bc060550 in object 0x7fc5bfb94440 (PipelineSpoolingTask`2) at offset 16
Comment 18 Mark Probst 2014-12-09 13:10:36 UTC
I'm not sure going backward through the chain will help you much.  I suggest using the binary protocol to see what it can tell you about the failing pointer and the object containing it.
Comment 19 Alexander Kyte 2015-03-17 12:46:13 UTC
I'm pretty sure that I just started running into this myself:

akyte@Alexanders-MacBook-Air ~/xunit (master*) $ MONO_GC_DEBUG=verify-before-collections mono --debug ./src/xunit.console/bin/Debug/xunit.console.exe test/test.xunit.assert/bin/Debug/test.xunit.assert.dll 
xUnit.net console test runner (64-bit .NET 4.0.30319.17020)
Copyright (C) 2015 Outercurve Foundation.

Discovering: test.xunit.assert (method display = ClassAndMethod, parallel test collections = True, max threads = 4)
Before getdiscoverer
   at System.Environment.get_StackTrace() in /Users/akyte/mono-perf-remoting/mcs/class/corlib/System/Environment.cs:line 321
   at Xunit.Xunit2Discoverer..ctor(ISourceInformationProvider sourceInformationProvider, IAssemblyInfo assemblyInfo, System.String assemblyFileName, System.String xunitExecutionAssemblyPath, System.String configFileName, Boolean shadowCopy, System.String shadowCopyFolder, IMessageSink diagnosticMessageSink) in /Users/akyte/xunit/src/xunit.runner.utility/Frameworks/v2/Xunit2Discoverer.cs:line 70
   at Xunit.Xunit2Discoverer..ctor(ISourceInformationProvider sourceInformationProvider, System.String assemblyFileName, System.String configFileName, Boolean shadowCopy, System.String shadowCopyFolder, IMessageSink diagnosticMessageSink) in /Users/akyte/xunit/src/xunit.runner.utility/Frameworks/v2/Xunit2Discoverer.cs:line 44
   at Xunit.Xunit2..ctor(ISourceInformationProvider sourceInformationProvider, System.String assemblyFileName, System.String configFileName, Boolean shadowCopy, System.String shadowCopyFolder, IMessageSink diagnosticMessageSink) in /Users/akyte/xunit/src/xunit.runner.utility/Frameworks/v2/Xunit2.cs:line 34
   at Xunit.XunitFrontController.CreateInnerController() in /Users/akyte/xunit/src/xunit.runner.utility/Frameworks/XunitFrontController.cs:line 108
   at Xunit.XunitFrontController.get_InnerController() in /Users/akyte/xunit/src/xunit.runner.utility/Frameworks/XunitFrontController.cs:line 73
   at Xunit.XunitFrontController.Find(Boolean includeSourceInformation, IMessageSink messageSink, ITestFrameworkDiscoveryOptions discoveryOptions) in /Users/akyte/xunit/src/xunit.runner.utility/Frameworks/XunitFrontController.cs:line 137
   at Xunit.ConsoleClient.Program.ExecuteAssembly(System.Object consoleLock, System.String defaultDirectory, Xunit.XunitProjectAssembly assembly, Boolean quiet, Boolean needsXml, Boolean teamCity, Boolean appVeyor, Nullable`1 parallelizeTestCollections, Nullable`1 maxThreadCount, Xunit.XunitFilters filters) in /Users/akyte/xunit/src/xunit.console/Program.cs:line 305
   at Xunit.ConsoleClient.Program.RunProject(System.String defaultDirectory, Xunit.XunitProject project, Boolean quiet, Boolean teamcity, Boolean appVeyor, Nullable`1 parallelizeAssemblies, Nullable`1 parallelizeTestCollections, Nullable`1 maxThreadCount) in /Users/akyte/xunit/src/xunit.console/Program.cs:line 180
   at Xunit.ConsoleClient.Program.Main(System.String[] args) in /Users/akyte/xunit/src/xunit.console/Program.cs:line 57
After getdiscoverer
Returning from Xunit2 constructor
Returning inner controller
Canceling... (Press Ctrl+C again to terminate)
akyte@Alexanders-MacBook-Air ~/xunit (master*) $ mono --debug ./src/xunit.console/bin/Debug/xunit.console.exe test/test.xunit.assert/bin/Debug/test.xunit.assert.dll 
* Assertion at aot-runtime.c:862, condition `ref->method' not met

Stacktrace:

  at <unknown> <0xffffffff>
  at System.__Filters..cctor () [0x00000] in /Users/akyte/mono-perf-remoting/external/referencesource/mscorlib/system/__filters.cs:29
  at (wrapper runtime-invoke) object.runtime_invoke_void (object,intptr,intptr,intptr) <IL 0x0005c, 0xffffffff>
  at <unknown> <0xffffffff>
  at System.RuntimeType.GetBaseType () [0x00000] in /Users/akyte/mono-perf-remoting/external/referencesource/mscorlib/system/rttype.cs:3821
  at System.RuntimeType.IsSubclassOf (System.Type) [0x00026] in /Users/akyte/mono-perf-remoting/external/referencesource/mscorlib/system/rttype.cs:3727
  at System.RuntimeType.IsValueTypeImpl () [0x0002c] in /Users/akyte/mono-perf-remoting/external/referencesource/mscorlib/system/rttype.cs:4057
  at System.Type.get_IsValueType () [0x00000] in /Users/akyte/mono-perf-remoting/external/referencesource/mscorlib/system/type.cs:1131
  at System.Type.get_IsClass () [0x00000] in /Users/akyte/mono-perf-remoting/external/referencesource/mscorlib/system/type.cs:1113
  at System.Collections.Concurrent.ConcurrentDictionary`2.IsValueWriteAtomic () <0x0002e>
  at System.Collections.Concurrent.ConcurrentDictionary`2..cctor () <0x00018>
  at (wrapper runtime-invoke) object.runtime_invoke_void (object,intptr,intptr,intptr) <IL 0x0005c, 0xffffffff>
  at <unknown> <0xffffffff>
  at Xunit.ConsoleClient.Program..cctor () [0x00000] in /Users/akyte/xunit/src/xunit.console/Program.cs:15
  at (wrapper runtime-invoke) object.runtime_invoke_void (object,intptr,intptr,intptr) <IL 0x0005c, 0xffffffff>

Native stacktrace:

        0   mono                                0x000000010a44a9aa mono_handle_native_sigsegv + 282
        1   libsystem_platform.dylib            0x00007fff8d788f1a _sigtramp + 26
        2   ???                                 0x0000000000000003 0x0 + 3
        3   libsystem_c.dylib                   0x00007fff892a0b73 abort + 129
        4   mono                                0x000000010a5e5fdd monoeg_log_default_handler + 125
        5   mono                                0x000000010a5e61a0 monoeg_assertion_message + 192
        6   mono                                0x000000010a43effb decode_method_ref_with_target + 7147
        7   mono                                0x000000010a438f7a decode_patch + 90
        8   mono                                0x000000010a438df3 mono_aot_plt_resolve + 211
        9   mono                                0x000000010a44c7e5 mono_aot_plt_trampoline + 37
        10  ???                                 0x000000010a818fb2 0x0 + 4471230386
        11  mscorlib.dll.dylib                  0x000000010c003855 System___Filters__cctor + 21
        12  mono                                0x000000010a3ad539 mono_jit_runtime_invoke + 1641
        13  mono                                0x000000010a55ba2e mono_runtime_invoke + 110
        14  mono                                0x000000010a55bf2e mono_runtime_class_init_full + 798
        15  mono                                0x000000010a3a9f63 mono_resolve_patch_target + 1923
        16  mono                                0x000000010a4387f0 load_method + 1728
        17  mono                                0x000000010a437fc1 mono_aot_get_method + 2113
        18  mono                                0x000000010a3aa995 mono_jit_compile_method_with_opt + 629
        19  mono                                0x000000010a3ad04a mono_jit_runtime_invoke + 378
        20  mono                                0x000000010a55ba2e mono_runtime_invoke + 110
        21  mono                                0x000000010a55bf2e mono_runtime_class_init_full + 798
        22  mono                                0x000000010a438b66 load_method + 2614
        23  mono                                0x000000010a44c6f4 mono_aot_trampoline + 52
        24  ???                                 0x000000010a818d52 0x0 + 4471229778
        25  mscorlib.dll.dylib                  0x000000010c0b9cc9 System_RuntimeType_GetBaseType + 41
        26  mono                                0x000000010a3ad539 mono_jit_runtime_invoke + 1641
        27  mono                                0x000000010a55ba2e mono_runtime_invoke + 110
        28  mono                                0x000000010a55bf2e mono_runtime_class_init_full + 798
        29  mono                                0x000000010a3aa9bd mono_jit_compile_method_with_opt + 669
        30  mono                                0x000000010a3aa6ca mono_jit_compile_method + 42
        31  mono                                0x000000010a44c01f common_call_trampoline + 1263
        32  ???                                 0x000000010a818172 0x0 + 4471226738
        33  ???                                 0x000000010cfe1d40 0x0 + 4512947520
        34  mono                                0x000000010a3ad539 mono_jit_runtime_invoke + 1641
        35  mono                                0x000000010a55ba2e mono_runtime_invoke + 110
        36  mono                                0x000000010a55bf2e mono_runtime_class_init_full + 798
        37  mono                                0x000000010a3d7a70 mono_method_to_ir + 145760
        38  mono                                0x000000010a3a5bf6 mini_method_compile + 2998
        39  mono                                0x000000010a3a7cb3 mono_jit_compile_method_inner + 675
        40  mono                                0x000000010a3aa9d1 mono_jit_compile_method_with_opt + 689
        41  mono                                0x000000010a3ad04a mono_jit_runtime_invoke + 378
        42  mono                                0x000000010a55ba2e mono_runtime_invoke + 110
        43  mono                                0x000000010a5612db mono_runtime_exec_main + 379
        44  mono                                0x000000010a41aed0 mono_main + 7808
        45  libdyld.dylib                       0x00007fff8da3e5c9 start + 1

Debug info from gdb:

[New Thread 0x1103 of process 50972]
[New Thread 0x1203 of process 50972]
Mono support loaded.
^C^C^Cwarning: Could not open OSO archive file "/BinaryCache/corecrypto/corecrypto-233.1.2~26/Symbols/BuiltProducts/libcorecrypto_static.a"
warning: `/BinaryCache/coreTLS/coreTLS-35.1.2~1/Objects/coretls.build/coretls.build/Objects-normal/x86_64/system_coretls_vers.o': can't open to read symbols: No such file or directory.
warning: Could not open OSO archive file "/BinaryCache/coreTLS/coreTLS-35.1.2~1/Symbols/BuiltProducts/libcoretls_ciphersuites.a"
warning: Could not open OSO archive file "/BinaryCache/coreTLS/coreTLS-35.1.2~1/Symbols/BuiltProducts/libcoretls_handshake.a"
warning: Could not open OSO archive file "/BinaryCache/coreTLS/coreTLS-35.1.2~1/Symbols/BuiltProducts/libcoretls_record.a"
warning: Could not open OSO archive file "/BinaryCache/coreTLS/coreTLS-35.1.2~1/Symbols/BuiltProducts/libcoretls_stream_parser.a"
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/bdz.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/bdz_ph.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/bmz.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/bmz8.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/brz.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/buffer_entry.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/buffer_manager.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/chd.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/chd_ph.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/chm.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/cmph.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/cmph_structs.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/compressed_rank.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/compressed_seq.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/fch.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/fch_buckets.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/graph.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/hash.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/jenkins_hash.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/linear_string_map.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/miller_rabin.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/select.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/vqueue.o': can't open to read symbols: No such file or directory.
warning: `/BinaryCache/cmph/cmph-1~1091/Objects/cmph.build/cmph.build/Objects-normal/x86_64/vstack.o': can't open to read symbols: No such file or directory.
warning: `/var/folders/9n/mdh_qr_11ql3r03sxtd91yz80000gn/T/mono_aot_CkUp4M.o': can't open to read symbols: No such file or directory.
0x00007fff8fb328fe in __wait4 () from /usr/lib/system/libsystem_kernel.dylib
  Id   Target Id         Frame 
  3    Thread 0x1203 of process 50972 0x00007fff8fb3322e in kevent64 () from /usr/lib/system/libsystem_kernel.dylib
  2    Thread 0x1103 of process 50972 0x00007fff8fb2d56a in semaphore_wait_trap () from /usr/lib/system/libsystem_kernel.dylib
* 1    Thread 0x1003 of process 50972 0x00007fff8fb328fe in __wait4 () from /usr/lib/system/libsystem_kernel.dylib

Thread 3 (Thread 0x1203 of process 50972):
#0  0x00007fff8fb3322e in kevent64 () from /usr/lib/system/libsystem_kernel.dylib
#1  0x00007fff8dffdd91 in quote_attrs () from /usr/lib/system/libdispatch.dylib
#2  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x1103 of process 50972):
#0  0x00007fff8fb2d56a in semaphore_wait_trap () from /usr/lib/system/libsystem_kernel.dylib
#1  0x000000010a5d92e7 in mono_sem_wait (sem=<optimized out>, alertable=<optimized out>, sem=<optimized out>, alertable=<optimized out>) at mono-semaphore.c:103
#2  0x000000010a5599b2 in finalizer_thread (unused=<optimized out>) at gc.c:1093
#3  0x000000010a53854b in start_wrapper_internal (data=<optimized out>) at threads.c:664
#4  start_wrapper (data=<optimized out>) at threads.c:711
#5  0x000000010a5dfe0e in inner_start_thread (arg=<optimized out>) at mono-threads-posix.c:93
#6  0x00007fff8e9d62fc in ?? () from /usr/lib/system/libsystem_pthread.dylib
#7  0x0000000000001803 in ?? ()
#8  0x000000010d2df000 in ?? ()
#9  0x000000010d2def50 in ?? ()
#10 0x00007fff8e9d6279 in ?? () from /usr/lib/system/libsystem_pthread.dylib
#11 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x1003 of process 50972):
#0  0x00007fff8fb328fe in __wait4 () from /usr/lib/system/libsystem_kernel.dylib
#1  0x000000010a44aa41 in mono_handle_native_sigsegv (signal=<optimized out>, ctx=<optimized out>, info=<optimized out>) at mini-exceptions.c:2346
Backtrace stopped: Cannot access memory at address 0xc74d
Comment 20 Alexander Kyte 2015-03-17 13:36:11 UTC
I can track it down to the presence of the nursery-canaries variable in the environment. I'm not sure if I have the same bug as the filer.
Comment 21 evolvedmicrobe 2015-03-17 15:03:17 UTC
For what it's worth, I was unable to get this bug to show up when tracking the GC using the binary protocol, so didn't have much more success tracking it down.  I am not sure if this bug is the same.
Comment 22 danb 2015-04-19 21:51:08 UTC
I believe we are seeing a very similar bug from copy_object_no_checks:
#0  0x0000003cebc0f2ad in waitpid () from /lib64/libpthread.so.0
#1  0x00000000004a45c8 in mono_handle_native_sigsegv (signal=<value optimized out>, ctx=<value optimized out>) at mini-exceptions.c:2323
#2  0x00000000004ff9bf in mono_arch_handle_altstack_exception (sigctx=0x7fda10764c40, fault_addr=<value optimized out>, stack_ovf=0) at exceptions-amd64.c:861
#3  0x0000000000416099 in mono_sigsegv_signal_handler (_dummy=11, _info=0x7fda10764d70, context=0x7fda10764c40) at mini.c:6858
#4  <signal handler called>
#5  copy_object_no_checks (obj=0x7fda15d14458, queue=0x963a60) at sgen-copy-object.h:78
#6  0x00000000005e76b4 in simple_nursery_serial_copy_object_from_obj (start=<value optimized out>, desc=<value optimized out>, queue=0x963a60) at sgen-minor-copy-object.h:199
#7  simple_nursery_serial_scan_object (start=<value optimized out>, desc=<value optimized out>, queue=0x963a60) at sgen-scan-object.h:60
#8  0x00000000005d0878 in major_scan_card_table (mod_union=0, queue=0x963a60) at sgen-marksweep.c:1929
#9  0x00000000005da82f in sgen_card_table_finish_scan_remsets (start_nursery=<value optimized out>, end_nursery=<value optimized out>, queue=0x963a60) at sgen-cardtable.c:445
#10 0x00000000005c5e31 in job_finish_remembered_set_scan (worker_data=<value optimized out>, job_data_untyped=0x7fd965192c80) at sgen-gc.c:2039
#11 0x00000000005cbd96 in collect_nursery (unpin_queue=0x0, finish_up_concurrent_mark=0) at sgen-gc.c:2319
#12 0x00000000005cc670 in sgen_perform_collection (requested_size=4096, generation_to_collect=0, reason=<value optimized out>, wait_to_finish=<value optimized out>) at sgen-gc.c:3195
#13 0x00000000005e0834 in mono_gc_alloc_obj_nolock (vtable=0x1ec82d0, size=<value optimized out>) at sgen-alloc.c:319
#14 0x00000000005e0c44 in mono_gc_alloc_obj (vtable=0x1ec82d0, size=16) at sgen-alloc.c:500

We have increased the SGEN_MAX_SMALL_OBJ_SIZE to 64000 and MS_BLOCK_SHIFT to 17 (also changed MS_BLOCKSIZE to be 1 << MS_BLOCK_SHIFT which wasn't in the 3.12.0 branch).  I don't think this is causing our issue though, merely making it show up more quickly.  Unfortunately reducing the SG_MAX_SMALL_OBJ_SIZE isn't an option in our application.
Comment 23 danb 2015-04-20 18:29:18 UTC
Also fwiw we also see the original stack trace you reported but this happens when the heap runs out of space (we have a max heap size set).
Comment 24 Mark Probst 2015-04-21 13:43:08 UTC
Another thing to try - we've had trouble with the recently:

  MONO_GC_DEBUG=xdomain-checks
Comment 25 Ludovic Henry 2017-07-12 23:06:33 UTC
Can you still reproduce with latest version of mono? Thank you