Bug 44397 - SIGSEGV in tarjan running sgen-bridge-major-fragmentation.exe test on armel
Summary: SIGSEGV in tarjan running sgen-bridge-major-fragmentation.exe test on armel
Status: VERIFIED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: GC ()
Version: master
Hardware: PC Mac OS
: --- normal
Target Milestone: 4.8.0 (C9)
Assignee: Andi McClure
URL:
Depends on:
Blocks:
 
Reported: 2016-09-15 23:07 UTC by Andi McClure
Modified: 2017-02-24 11:57 UTC (History)
4 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
VERIFIED FIXED

Description Andi McClure 2016-09-15 23:07:53 UTC
The automated test sgen-bridge-major-fragmentation.exe has been crashing about 1/3 of the time on armel (but no other platform) when run with the MONO_GC_DEBUG="bridge=Bridge" option which simulates the Android bridge. The crash occurs during a forced collection. Some examples:

https://jenkins.mono-project.com/job/test-mono-mainline-linux/label=debian-8-armel/793/testReport/MonoTests/sgen-bridge-tests-plain/sgen_bridge_major_fragmentation_exe/
https://jenkins.mono-project.com/job/test-mono-mainline-linux/label=debian-8-armel/789/testReport/MonoTests/sgen-bridge-tests-plain/sgen_bridge_major_fragmentation_exe/

Thread 1 (Thread 0xf74e9000 (LWP 17671)):
#0  0xf73d2d98 in waitpid () from /lib/arm-linux-gnueabi/libpthread.so.0
#1  0x000c3c38 in mono_handle_native_sigsegv (signal=signal@entry=11, ctx=ctx@entry=0xffb626f8, info=info@entry=0xffb62678) at mini-exceptions.c:2426
#2  0x00035334 in mono_sigsegv_signal_handler (_dummy=7, _info=0xffb62678, context=0xffb626f8) at mini-runtime.c:2884
#3  <signal handler called>
#4  0x001fc2fc in dfs () at sgen-tarjan-bridge.c:878
#5  processing_stw_step () at sgen-tarjan-bridge.c:982
#6  0x001f6980 in sgen_bridge_processing_stw_step () at sgen-bridge.c:210
#7  0x00207494 in sgen_client_bridge_processing_stw_step () at ../../mono/metadata/sgen-client-mono.h:259
#8  finish_gray_stack (generation=generation@entry=1, ctx=...) at sgen-gc.c:1100
#9  0x002083c8 in major_finish_collection (gc_thread_gray_queue=0xffb62b68, gc_thread_gray_queue@entry=0xffb62b60, reason=reason@entry=0x2f4c78 "user request", is_overflow=1, is_overflow@entry=0, old_next_pin_slot=20, forced=forced@entry=1) at sgen-gc.c:1940
#10 0x0020879c in major_do_collection (reason=0x2f4c78 "user request", is_overflow=0, forced=1) at sgen-gc.c:2066
#11 0x0020b7d4 in sgen_perform_collection (requested_size=8216, requested_size@entry=0, generation_to_collect=1, reason=reason@entry=0x2f4c78 "user request", wait_to_finish=wait_to_finish@entry=1, stw=stw@entry=1) at sgen-gc.c:2263
Comment 1 Andi McClure 2016-09-15 23:08:15 UTC
(The stack is the same in all examples I've seen)
Comment 2 Andi McClure 2016-09-15 23:08:51 UTC
Assigning to C9 for now, may try to backport to a C8 SR if it turns out to be present there.
Comment 3 Andi McClure 2016-09-26 15:09:09 UTC
I looked into this with Vlad Brezae on Friday. Vlad eventually found the explanation:

Apparently this change https://github.com/mono/mono/commit/883e0a2d899a6a19db2c1d5222d45b8423ce36a2 introduced a bug where attempting certain allocation sizes from sgen would allocate and return memory from an unaligned address. Apparently on ARM the behavior of unaligned memory accesses is undefined. The allocation size explains why only sgen is triggering it, and the fact this is undefined rather than explicitly faulty behavior explains why it only shows up on specific hardware/ABI combinations.

Vlad will be putting in a fix soon. Because 883e0a2d899 is present in master but not C8 or C9, it does not need to be backported.
Comment 4 Vlad Brezae 2016-09-27 10:13:10 UTC
Fixed by 837d12bfc6b5a2d7575e3ed16dfd735358ba8813
Comment 5 Akhilesh kumar 2017-02-21 11:09:18 UTC
@Vlad Brezae
I am trying to verify this issue with C9 builds, but I am not sure about the steps to reproduce this issue.

Could you please provide me some test steps. So that I can reproduce this issue at my end, also provide the reproduction build info.

Thanks!
Comment 6 Vlad Brezae 2017-02-21 13:55:44 UTC
Hey. First of all, as far as I remember this issue was a minor regression on master which was quickly fixed in the early days of 4.8. So testing with older releases is likely unuseful.

If you indeed want to repro the issue, you would need to get mono (likely build it) on an arm linux and run make sgen-bridge-tests in mono/tests. You could then just run the command for the sgen-bridge-major-fragmentation test in a loop, since I remember it was fairly intermittent. So repro-ing is not the easiest of tasks.
Comment 7 Akhilesh kumar 2017-02-24 11:57:55 UTC
As per comment 6, this issue was a miner issue and it was fixed early days of Mono 4.8, also its reproduction is not easy task.

So as of now I am closing this issue.

Please feel free to reopen it if anyone is still facing this issue.

Thanks!