Bug 55382 - Mono app crash randomly at /mono/utils/mono-os-semaphore.h:228
Summary: Mono app crash randomly at /mono/utils/mono-os-semaphore.h:228
Status: RESOLVED NORESPONSE
Alias: None
Product: Runtime
Classification: Mono
Component: General ()
Version: 4.6.0 (C8)
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2017-04-19 14:53 UTC by Pranas
Modified: 2017-07-07 19:36 UTC (History)
5 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
crash report (17.56 KB, text/plain)
2017-05-04 11:53 UTC, Pranas
Details
heapshot report (21.86 KB, text/plain)
2017-05-04 11:56 UTC, Pranas
Details
crash report 2 (16.39 KB, text/plain)
2017-05-04 19:15 UTC, Pranas
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED NORESPONSE

Description Pranas 2017-04-19 14:53:36 UTC
A mono application crash randomly in production. A crash doesn't correlate to a [memory/cpu/thread count].
This happens on most of the 8 servers in production. On some servers it happens 2-3 times per day, on others once per 3 days.
Other mono applications doesn't crash on the same servers and the same mono runtime.

We don't know how to reproduce it. 

The big change that might be related in the last release of our application was, that it uses a lot of short threads. These threads are created for each message handler. The thread can take a few miliseconds to complete.

1) Can you advice how to troubleshoot in this case?
2) Does a Native stacktrace tells you what the runtime is doing at the moment of crash? I'd like to know the context, it could help me to do some workaround in my code to avoid this Mono path.

**The line where Mono is crashing:**
https://github.com/mono/mono/blob/mono-4.6.2.7/mono/utils/mono-os-semaphore.h#L228

**OS:**
CentOS 2.6.32-642.6.1.el6.x86_64

**Mono version:**
Mono JIT compiler version 4.6.2 (Stable 4.6.2.7/08fd525 Tue Nov 29 13:50:06 UTC 2016)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
        TLS:           __thread
        SIGSEGV:       altstack
        Notifications: epoll
        Architecture:  amd64
        Disabled:      none
        Misc:          softdebug
        LLVM:          supported, not enabled.
        GC:            sgen

**The output on crash is:**
 * Assertion at ../../mono/utils/mono-os-semaphore.h:228, condition `errno != EINVAL' not met
 
 
 Native stacktrace:
 
 #011/opt/mono/bin/mono-sgen() [0x4a2695]
 #011/lib64/libpthread.so.0(+0xf7e0) [0x7fe21a3ea7e0]
 #011/lib64/libc.so.6(gsignal+0x35) [0x7fe219e635e5]
 #011/lib64/libc.so.6(abort+0x175) [0x7fe219e64dc5]
 #011/opt/mono/bin/mono-sgen() [0x65dd5a]
 #011/opt/mono/bin/mono-sgen() [0x65dae4]
 #011/opt/mono/bin/mono-sgen() [0x65dca4]
 #011/opt/mono/bin/mono-sgen() [0x65528d]
 #011/lib64/libpthread.so.0(+0x7aa1) [0x7fe21a3e2aa1]
 #011/lib64/libc.so.6(clone+0x6d) [0x7fe219f19aad]
 
 Debug info from gdb:
 
 
 =================================================================
 Got a SIGABRT while executing native code. This usually indicates
 a fatal error in the mono runtime or one of the native libraries
 used by your application.
 =================================================================
 
 Aborted
Comment 1 Zoltan Varga 2017-04-20 17:31:51 UTC
Please use the latest stable mono version which is 4.8.
Comment 2 Pranas 2017-04-20 19:57:39 UTC
Thanks for an answer. But that's the last thing we would want to try, because updating the Mono would require a full regression and as we can't reproduce that in our QA, we wouldn't know if this helped or not until we deploy to production.

We've been in a similar situation when we tried to upgrade Mono to solve a problem, but the newer Mono version had a socket defect that caused a deadlock. It was a big pain to figure out an issue.

So I'd like to find the root cause of this problem, if it's a known issue, than I'd better include a targeted Mono patch that solves the problem. Or change our code to avoid this Mono path.

Thanks.
Comment 3 Zoltan Varga 2017-04-20 23:55:00 UTC
Unfortunately, there is no meaningful stack trace and the assertion location is low level code which is called from a lot of places, so its not possible to determine what the exact issue is, it might be fixed in a later mono version, or it might not be.
Comment 4 Rodrigo Kumpera 2017-04-21 22:05:30 UTC
Could you install debug symbols and get a backtrace with full symbols on it?
Comment 5 Pranas 2017-05-04 11:53:43 UTC
Created attachment 21943 [details]
crash report

Hi, Rodrigo,

Mono crashed in our performance environment yesterday. There's a GDB installed and the output has a bit more info. See an attachment [crash_report.txt].

1) Is it normal that 5 thread stacks where printed only? There's 45 threads total.
2) Does it help to identify the problem? Is there anything else I could do to help identify the problem?
Comment 6 Pranas 2017-05-04 11:56:28 UTC
Created attachment 21944 [details]
heapshot report

Not sure if it's related, but we're seeing the memory leaking in most mono processes. There's a lot of [Thread, InternalThread, GenericPrincipal] objects uncollected. See the [US3_mprof_heapshot.txt ] attached.
Comment 7 Zoltan Varga 2017-05-04 12:39:56 UTC
The backtrace doesn't seem to contain the crashing thread, so it doesn't help much. It should print out all the threads, not sure why its not happening.
Comment 8 Pranas 2017-05-04 15:25:04 UTC
Yes, that's another problem we're facing. If I try to Debug with GDB and run a macro "mono_backtrace" then it always causes SIGSEGV for a process. mono_pmip works on 10% of threads. On other threads it causes SIGSEGV.
I never saw it printing a full stacktrace in any of our environments.
This happened on Mono v4.4 as well.
Comment 9 Pranas 2017-05-04 19:15:32 UTC
Created attachment 21960 [details]
crash report 2

Here's another crash report from a different service.
Comment 10 Pranas 2017-05-05 09:47:11 UTC
do you think this glibc bug can be related?
https://sourceware.org/bugzilla/show_bug.cgi?id=12674

in our environment:
#  rpm -q -a | grep glibc
glibc-2.12-1.192.el6.x86_64
glibc-common-2.12-1.192.el6.x86_64
Comment 11 Zoltan Varga 2017-05-05 16:06:36 UTC
It could be related, or not, hard to say, EINVAL is a general error code. I'd strongly suggest trying to update to mono 4.8, it _might_ fix the problem, there are packages available for most distros here:

http://www.mono-project.com/download/#download-lin
Comment 12 Ludovic Henry 2017-07-07 19:36:58 UTC
If you can still reproduce with latest mono version, please feel free to reopen the bug. Thank you.