Bug 57272 - SIGSEGV during concurrent sweep of major
Summary: SIGSEGV during concurrent sweep of major
Status: RESOLVED NORESPONSE
Alias: None
Product: Runtime
Classification: Mono
Component: GC ()
Version: 5.0 (2017-02)
Hardware: PC Linux
: --- normal
Target Milestone: Future Cycle (TBD)
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2017-06-08 13:23 UTC by Kevin Boyle
Modified: 2017-08-17 17:29 UTC (History)
4 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED NORESPONSE

Description Kevin Boyle 2017-06-08 13:23:15 UTC
I don't have enough experience of mono internals to know if the bug title is accurate, I'm merely going on what I think the stack is telling me.

We have an OWIN web server running on top of mono which crashes frequently, but not reliably reproducibly. Thanks to the genius of `MONO_DEBUG=suspend-on-sigsegv` I was able to get gdb attached to our staging container the last time this occurred and grab the stacktraces and memory dump at the time. 

Although it is staging and has nothing too sensitive in it, I'm not comfortable sharing the stuff I've collected in an open forum. If somebody at Xamarin wants the dump to aid their debugging then happy to share it with them privately. 

The thread that caused the exception was:

Thread 34 (Thread 0x7f0070fff700 (LWP 6)):                                                                                                                               
#0  0x00007f0071e94f2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81                                                                                          
#1  0x00007f0071e94dc4 in __sleep (seconds=0, seconds@entry=1)                                                                                                           
    at ../sysdeps/unix/sysv/linux/sleep.c:137                                                                                                                            
#2  0x00000000004adafa in mono_handle_native_crash (signal=<optimized out>,                                                                                              
    signal@entry=0x69abd3 "SIGSEGV", ctx=ctx@entry=0x7f0070ffd580,                                                                                                       
    info=info@entry=0x7f0070ffd6b0) at mini-exceptions.c:2492                                                                                                            
#3  0x000000000042671c in mono_sigsegv_signal_handler (_dummy=11,                                                                                                        
    _info=0x7f0070ffd6b0, context=0x7f0070ffd580) at mini-runtime.c:2821                                                                                                 
#4  <signal handler called>                                                                                                                                              
#5  __GI_abort () at abort.c:125                                                                                                                                         
#6  0x00000000004adac9 in mono_handle_native_crash (signal=<optimized out>,                                                                                              
    ctx=<optimized out>, info=<optimized out>) at mini-exceptions.c:2615                                                                                                 
#7  <signal handler called>                                                                                                                                              
#8  0x00007f0071e10067 in __GI_raise (sig=sig@entry=6)                                                                                                                   
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56                                                                                                                        
#9  0x00007f0071e11448 in __GI_abort () at abort.c:89                                                                                                                    
#10 0x000000000067ae49 in mono_log_write_syslog (domain=<optimized out>,                                                                                                 
    level=<optimized out>, hdr=<optimized out>, message=<optimized out>)                                                                                                 
    at mono-log-posix.c:88                                                                                                                                               
#11 0x000000000068ff3d in monoeg_g_logv (log_domain=log_domain@entry=0x0,                                                                                                
    log_level=log_level@entry=G_LOG_LEVEL_ERROR,                                                                                                                                                                                                                                       
    format=format@entry=0x6997b0 "* Assertion: should not be reached at %s:%d\n", args=args@entry=0x7f0070ffece0) at goutput.c:115                                       
#12 0x00000000006900d3 in monoeg_assertion_message (                                                                                                                     
    format=format@entry=0x6997b0 "* Assertion: should not be reached at %s:%d\n") at goutput.c:135                                                                       
#13 0x000000000065e2b8 in major_scan_object_concurrent_with_evacuation (                                                                                                 
    full_object=0x7f0000000000, desc=<optimized out>,                                                                                                                    
    queue=queue@entry=0x7f0072e78010) at sgen-scan-object.h:90                                                                                                           
#14 0x000000000065f38c in drain_gray_stack_concurrent_with_evacuation (                                                                                                  
    queue=<optimized out>) at sgen-marksweep-drain-gray-stack.h:339                                                                                                      
#15 drain_gray_stack_concurrent (queue=0x7f0072e78010) at sgen-marksweep.c:1321                                                                                          
#16 0x0000000000672151 in marker_idle_func (data_untyped=0x7f0072e78008)                                                                                                 
    at sgen-workers.c:328                                                                                                                                                
#17 0x000000000067134f in thread_func (thread_data=0x7f0072e78008)                                                                                                       
    at sgen-thread-pool.c:151                                                                                                                                            
#18 0x00007f00723a4064 in start_thread (arg=0x7f0070fff700)                                                                                                              
    at pthread_create.c:309                                                                                                                                              
#19 0x00007f0071ec362d in clone ()                                                                                                                                       
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111


It is running in EC2, on top of Ubuntu 16.04, within a docker container based on Debian Jessie.
Comment 1 Vlad Brezae 2017-07-03 21:35:12 UTC
Hello,

     I'm afraid there is not much that can be done with this type of bug without a more or less reliable repro.

     Is it still reproducible with latest version of 5.0 ?
Comment 2 Ludovic Henry 2017-07-03 21:38:41 UTC
Hello,

Please provide a repro case so we can debug it on our end. This kind of bug is impossible to debug if we can't reproduce it ourselves.

Thank you,
Ludovic
Comment 3 Kevin Boyle 2017-07-04 10:41:47 UTC
Unfortunately I don't have a minimal repro as this is part of the startup of a lrge web app. Are the core dumps enough or have they happened too late after lots of corruption has occurred?
Comment 4 Vlad Brezae 2017-07-05 12:09:01 UTC
You can provide the dump and I'll take a look at it, but it's unlikely to be of help, since it would just show an invalid heap state but no information about how it got there.
Comment 5 Kevin Boyle 2017-07-05 12:14:11 UTC
Understood. Frustrating bug! Sorry I couldn't be of more help, but unfortunately it is part of a large system and doesn't happen at a deterministic time so not sure I could even make a minimal repro of this. 

https://drive.google.com/a/gearset.com/file/d/0B7RRkFm5J7pbdjFkRHMteFFBdGs/view?usp=sharing

Is the core dump I grabbed
Comment 6 Vlad Brezae 2017-07-21 20:32:53 UTC
What mono version was used for the above core dump ? I tried with the following mono but it doesn't seem to be compatible :

$ mono --version
Mono JIT compiler version 5.0.1.1 (2017-02/5077205 Thu May 25 09:16:53 UTC 2017)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           __thread
	SIGSEGV:       altstack
	Notifications: epoll
	Architecture:  amd64
	Disabled:      none
	Misc:          softdebug 
	LLVM:          supported, not enabled.
	GC:            sgen (concurrent by default)
Comment 7 Ludovic Henry 2017-08-17 17:29:52 UTC
Please reopen whenever you provide a reproduction case so we can have a look. Thank you.