Bug 43894 - Building Mono 4.6.0.165 on aarch64 fails on mscorlib.dll with segmentation fault
Summary: Building Mono 4.6.0.165 on aarch64 fails on mscorlib.dll with segmentation fault
Status: RESOLVED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: GC ()
Version: 4.6.0 (C8)
Hardware: PC Linux
: Normal normal
Target Milestone: 15.3
Assignee: Vlad Brezae
URL:
Depends on:
Blocks:
 
Reported: 2016-08-31 08:26 UTC by Timotheus Pokorra
Modified: 2017-06-23 20:15 UTC (History)
9 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
output of the gdb backtrace (24.95 KB, text/x-log)
2016-09-09 12:49 UTC, Timotheus Pokorra
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Timotheus Pokorra 2016-08-31 08:26:27 UTC
I get this output:

Stacktrace:
  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Environment.Exit (int) <0x00007>
  at Mono.CSharp.Driver.Main (string[]) <0x0021b>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_int_object (object,intptr,intptr,intptr) <0x000db>
Native stacktrace:
/bin/sh: line 1: 27605 Segmentation fault      (core dumped) MONO_PATH="./../../class/lib/monolite:$MONO_PATH" /builddir/build/BUILD/mono-4.6.0/runtime/mono-wrapper ./../../class/lib/monolite/basic.exe /codepage:65001 -unsafe -nostdlib -nowarn:612,618,1635 -d:INSIDE_CORLIB,MONO_CULTURE_DATA -d:LIBC -d:FEATURE_PAL,GENERICS_WORK,FEATURE_LIST_PREDICATES,FEATURE_SERIALIZATION,FEATURE_ASCII,FEATURE_LATIN1,FEATURE_UTF7,FEATURE_UTF32,MONO_HYBRID_ENCODING_SUPPORT,FEATURE_ASYNC_IO,NEW_EXPERIMENTAL_ASYNC_IO,FEATURE_UTF32,FEATURE_EXCEPTIONDISPATCHINFO,FEATURE_CORRUPTING_EXCEPTIONS,FEATURE_EXCEPTION_NOTIFICATIONS,FEATURE_STRONGNAME_MIGRATION,FEATURE_USE_LCID,FEATURE_FUSION,FEATURE_CRYPTO,FEATURE_X509_SECURESTRINGS,FEATURE_SYNCHRONIZATIONCONTEXT,FEATURE_SYNCHRONIZATIONCONTEXT_WAIT -d:FEATURE_MACL -d:FEATURE_REMOTING,MONO_COM,FEATURE_COMINTEROP,FEATURE_ROLE_BASED_SECURITY -d:MONO_FEATURE_THREAD_ABORT -d:MONO_FEATURE_THREAD_SUSPEND_RESUME -d:MONO_FEATURE_MULTIPLE_APPDOMAINS -d:NET_4_0 -d:NET_4_5 -d:MONO -d:BOOTSTRAP_BASIC -nowarn:1699 -lib:./../../class/lib/basic -optimize /noconfig -d:FEATURE_PAL,GENERICS_WORK,FEATURE_LIST_PREDICATES,FEATURE_SERIALIZATION,FEATURE_ASCII,FEATURE_LATIN1,FEATURE_UTF7,FEATURE_UTF32,MONO_HYBRID_ENCODING_SUPPORT,FEATURE_ASYNC_IO,NEW_EXPERIMENTAL_ASYNC_IO,FEATURE_UTF32,FEATURE_EXCEPTIONDISPATCHINFO,FEATURE_CORRUPTING_EXCEPTIONS,FEATURE_EXCEPTION_NOTIFICATIONS,FEATURE_STRONGNAME_MIGRATION,FEATURE_USE_LCID,FEATURE_FUSION,FEATURE_CRYPTO,FEATURE_X509_SECURESTRINGS,FEATURE_SYNCHRONIZATIONCONTEXT,FEATURE_SYNCHRONIZATIONCONTEXT_WAIT -d:FEATURE_MACL -d:FEATURE_REMOTING,MONO_COM,FEATURE_COMINTEROP,FEATURE_ROLE_BASED_SECURITY -d:MONO_FEATURE_THREAD_ABORT -d:MONO_FEATURE_THREAD_SUSPEND_RESUME -d:MONO_FEATURE_MULTIPLE_APPDOMAINS -resource:resources/charinfo.nlp -resource:resources/collation.core.bin -resource:resources/collation.tailoring.bin -resource:resources/collation.cjkCHS.bin -resource:resources/collation.cjkCHT.bin -resource:resources/collation.cjkJA.bin -resource:resources/collation.cjkKO.bin -resource:resources/collation.cjkKOlv2.bin --runtime:v4 -target:library -out:../../class/lib/basic/mscorlib.dll @corlib.dll.sources
../../build/library.make:279: recipe for target '../../class/lib/basic/mscorlib.dll' failed
make[8]: *** [../../class/lib/basic/mscorlib.dll] Error 139

This is for the Fedora packages. (see related bug https://bugzilla.redhat.com/show_bug.cgi?id=1371829)

any ideas or solutions? Thanks!
Comment 1 Andi McClure 2016-08-31 15:58:19 UTC
Tim: Hm, that's concerning. I do not reproduce this on our aarch64 builder. Does this happen 100% of the time for you or is it random?

It would be helpful to get a native stacktrace. Could you rerun the line that failed under GDB? You can do this by navigating to the directory where the problem command was run and prepending 
MONO_EXECUTABLE="gdb --args ../../../mono/mini/mono-sgen"

So for example I believe this will work if you run it from the build root:

(cd mcs/class/corlib && (MONO_EXECUTABLE="gdb --args ../../../mono/mini/mono-sgen" MONO_PATH="./../../class/lib/monolite:$MONO_PATH" /builddir/build/BUILD/mono-4.6.0/runtime/mono-wrapper ./../../class/lib/monolite/basic.exe /codepage:65001 -unsafe -nostdlib -nowarn:612,618,1635 -d:INSIDE_CORLIB,MONO_CULTURE_DATA -d:LIBC -d:FEATURE_PAL,GENERICS_WORK,FEATURE_LIST_PREDICATES,FEATURE_SERIALIZATION,FEATURE_ASCII,FEATURE_LATIN1,FEATURE_UTF7,FEATURE_UTF32,MONO_HYBRID_ENCODING_SUPPORT,FEATURE_ASYNC_IO,NEW_EXPERIMENTAL_ASYNC_IO,FEATURE_UTF32,FEATURE_EXCEPTIONDISPATCHINFO,FEATURE_CORRUPTING_EXCEPTIONS,FEATURE_EXCEPTION_NOTIFICATIONS,FEATURE_STRONGNAME_MIGRATION,FEATURE_USE_LCID,FEATURE_FUSION,FEATURE_CRYPTO,FEATURE_X509_SECURESTRINGS,FEATURE_SYNCHRONIZATIONCONTEXT,FEATURE_SYNCHRONIZATIONCONTEXT_WAIT -d:FEATURE_MACL -d:FEATURE_REMOTING,MONO_COM,FEATURE_COMINTEROP,FEATURE_ROLE_BASED_SECURITY -d:MONO_FEATURE_THREAD_ABORT -d:MONO_FEATURE_THREAD_SUSPEND_RESUME -d:MONO_FEATURE_MULTIPLE_APPDOMAINS -d:NET_4_0 -d:NET_4_5 -d:MONO -d:BOOTSTRAP_BASIC -nowarn:1699 -lib:./../../class/lib/basic -optimize /noconfig -d:FEATURE_PAL,GENERICS_WORK,FEATURE_LIST_PREDICATES,FEATURE_SERIALIZATION,FEATURE_ASCII,FEATURE_LATIN1,FEATURE_UTF7,FEATURE_UTF32,MONO_HYBRID_ENCODING_SUPPORT,FEATURE_ASYNC_IO,NEW_EXPERIMENTAL_ASYNC_IO,FEATURE_UTF32,FEATURE_EXCEPTIONDISPATCHINFO,FEATURE_CORRUPTING_EXCEPTIONS,FEATURE_EXCEPTION_NOTIFICATIONS,FEATURE_STRONGNAME_MIGRATION,FEATURE_USE_LCID,FEATURE_FUSION,FEATURE_CRYPTO,FEATURE_X509_SECURESTRINGS,FEATURE_SYNCHRONIZATIONCONTEXT,FEATURE_SYNCHRONIZATIONCONTEXT_WAIT -d:FEATURE_MACL -d:FEATURE_REMOTING,MONO_COM,FEATURE_COMINTEROP,FEATURE_ROLE_BASED_SECURITY -d:MONO_FEATURE_THREAD_ABORT -d:MONO_FEATURE_THREAD_SUSPEND_RESUME -d:MONO_FEATURE_MULTIPLE_APPDOMAINS -resource:resources/charinfo.nlp -resource:resources/collation.core.bin -resource:resources/collation.tailoring.bin -resource:resources/collation.cjkCHS.bin -resource:resources/collation.cjkCHT.bin -resource:resources/collation.cjkJA.bin -resource:resources/collation.cjkKO.bin -resource:resources/collation.cjkKOlv2.bin --runtime:v4 -target:library -out:../../class/lib/basic/mscorlib.dll @corlib.dll.sources))

Once you're in gdb, enter `run`, and then when the crash occurs enter:
thread apply all bt
Comment 2 Timotheus Pokorra 2016-09-01 06:39:23 UTC
This happens every time on the Fedora build server for aarch64.

Peter Robinson or I will provide the native stacktrace soon.
Comment 3 Jo Shields 2016-09-01 09:53:56 UTC
It's going to be difficult for me to reproduce this - our aarch64 machines don't have KVM support enabled in the boot loader, so I can't just spin up a Fedora environment (and I would suspect some bad kernel interaction, so a chroot isn't meaningful).

Is there some way for non-Fedora devs to gain access to Fedora hardware, along the lines of https://dsa.debian.org/doc/guest-account/ ?

It'd be much easier if we could attach a debugger rather than trying to do this via back-and-forth.
Comment 4 Peter 2016-09-01 10:04:40 UTC
You can request Fedora 24 here http://www.linaro.org/leg/servercluster/

Feel free to mention it's to debug mono on Fedora on aarch64 with me (Peter Robinson - pbrobinson at redhat dot com)
Comment 5 Timotheus Pokorra 2016-09-09 12:49:22 UTC
Created attachment 17397 [details]
output of the gdb backtrace
Comment 6 Timotheus Pokorra 2016-10-10 07:21:58 UTC
May I ask if anyone had time to look into this already? Andi, Jo?

I just tried, the issue still happens with Mono 4.6.1
Comment 7 Jo Shields 2016-10-10 07:32:57 UTC
Sorry, I know I haven't had time to look at it, and I doubt Andi has either.

One other data point is I uploaded 4.6 to Debian, where it builds - see https://buildd.debian.org/status/fetch.php?pkg=mono&arch=arm64&ver=4.6.1.3%2Bdfsg-4&stamp=1475929279

I'd assume Linaro are offering the same hardware to Fedora as Debian (i.e. X-Gene) so that rules out a weirdness on the silicon, and implies something about the OS or compiler environment.
Comment 8 Peter 2016-10-10 10:00:34 UTC
Linaro has a collection of hardware so while it's possibly x-gene it's not guaranteed to be.

Fedora also tends to be a lot more bleeding edge in versions of the toolchain stack. Also uses 64K page sizes (vs 4K) so that might also affect it too if there's assumptions made about page sizes.
Comment 9 Andi McClure 2016-10-10 16:48:45 UTC
Crash occurs on line 2107 of sgen-marksweep.c:

2106    while (num_empty_blocks > section_reserve) {
2107        void *next = *(void**)empty_blocks;
2108        sgen_free_os_memory (empty_blocks, MS_BLOCK_SIZE, SGEN_ALLOC_HEAP);
2109        empty_blocks = next;

We do believe that the 64k page sizes could cause problems for mono. Could you please try rebuilding with this patch on top of mono-4.6.0.165 (we have not tested this locally at all)

diff --git a/mono/sgen/sgen-archdep.h b/mono/sgen/sgen-archdep.h
index da6aaf0..9b63d68 100644
--- a/mono/sgen/sgen-archdep.h
+++ b/mono/sgen/sgen-archdep.h
@@ -45,6 +45,11 @@
 
 #elif defined(TARGET_ARM64)
 
+/* MS_BLOCK_SIZE must be a multiple of the system pagesize, which for some
+   architectures is 64k.  */
+#define ARCH_MIN_MS_BLOCK_SIZE (64*1024)
+#define ARCH_MIN_MS_BLOCK_SIZE_SHIFT   16
+
 #ifdef __linux__
 #define REDZONE_SIZE    0
 #elif defined(__APPLE__)
Comment 10 Timotheus Pokorra 2016-10-10 16:50:45 UTC
Thanks for the hint!
Jo has told me the same thing on IRC, and I tested it.
Yes, this makes the build succeed!
Comment 11 Andi McClure 2016-10-10 17:27:50 UTC
The "real" fix for this is to query the system for the page size at runtime, at VM startup. Because the quick-fix patch above could possibly have performance impact on phone platforms, we would prefer not to fix this until we can do it right. Our suggestion (apparently Jo already relayed this on IRC) is that Debian maintain this as a local patch until we can get the true fix in.

Assigning to AlexRP because he has an idea of how to do the "real fix". If we can't get this in for the 4.8 release, we should consider making this a configure flag for 4.8.
Comment 12 Luis Aguilera 2017-02-18 00:21:30 UTC
C9 is now closed. We'll move this bug to the next scheduled milestone, "15.1". We'll continue working on these issues, and will attempt to resolve them ASAP.
Comment 13 Andi McClure 2017-03-31 21:02:57 UTC
This is too late for 15.2 milestone.
Comment 14 Rodrigo Kumpera 2017-06-19 15:45:22 UTC
Moving back to Ludovic to find someone to work on it.
Comment 15 Alex Rønne Petersen 2017-06-20 04:54:17 UTC
Assigning this to Vlad, he's taking over my PR ( https://github.com/mono/mono/pull/3751 ) since my plate is full with profiler stuff.
Comment 16 Vlad Brezae 2017-06-23 20:15:21 UTC
Fixed on mono 5.4