Bug 50529 - crash in thread-native-exit.exe
Summary: crash in thread-native-exit.exe
Status: RESOLVED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: io-layer ()
Version: master
Hardware: PC Mac OS
: --- normal
Target Milestone: ---
Assignee: Zoltan Varga
URL:
Depends on:
Blocks:
 
Reported: 2016-12-20 21:05 UTC by Zoltan Varga
Modified: 2017-09-06 18:08 UTC (History)
5 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
c testcase (2.69 KB, text/x-csrc)
2017-02-04 20:05 UTC, Zoltan Varga
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Zoltan Varga 2016-12-20 21:05:12 UTC
The above regression test sometimes crashes. To repro:

while true; do echo -n "."; ./mono --llvm thread-native-exit.exe || break; done

Do something which generates load, like running make check in tests.

The stack trace is the following:

#16 0x000000010ca4cf31 in mono_handle_native_crash (signal=0x10d5dc241 "SIGSEGV", ctx=0x0, info=0x0) at mini-exceptions.c:2610
#17 0x000000010cb4987b in altstack_handle_and_restore (ctx=0x7fff532d7860, obj=0x0, stack_ovf=0) at exceptions-amd64.c:780
#18 0x00007fff981fbb0c in _pthread_create ()
#19 0x000000010cce475d in mono_gc_pthread_create (new_thread=0x7fff532d7b48, attr=0x7fff532d7b78, start_routine=0x10cd7b630 <inner_start_thread>, arg=0x7fd57f703100) at sgen-mono.c:2508
#20 0x000000010cd7f3bc in mono_threads_platform_create_thread (thread_fn=0x10cd7b630 <inner_start_thread>, thread_data=0x7fd57f703100, stack_size=0x7fff532d7c48, out_tid=0x7fff532d7c58) at mono-threads-posix.c:126
#21 0x000000010cd7b49a in mono_threads_create_thread (start=0x10cc4dc80 <start_wrapper>, arg=0x7fd57f700da0, stack_size=0x7fff532d7c48, out_tid=0x7fff532d7c58) at mono-threads.c:1188
#22 0x000000010cc459cd in create_thread (thread=0x10e000b18, internal=0x10dd04798, start_delegate=0x10e000b90, start_func=0, start_func_arg=0x0, threadpool_thread=0, stack_size=0, error=0x7fff532d7cf0) at threads.c:790
#23 0x000000010cc46b35 in ves_icall_System_Threading_Thread_Thread_internal (this_obj=0x10e000b18, start=0x10e000b90) at threads.c:1209

The crash is always at exactly this location:

(gdb) x/20i $pc
(gdb) x/20i $pc
0x7fff981fbb0c <_pthread_create+383>:	mov    0x10(%rbx),%eax
0x7fff981fbb0f <_pthread_create+386>:	mov    %eax,%ecx
0x7fff981fbb11 <_pthread_create+388>:	or     $0x2,%ecx
0x7fff981fbb14 <_pthread_create+391>:	mov    %ecx,0x10(%rbx)

_pthread_create () contains is the following:

	pthread_t t2;
	t2 = __bsdthread_create(start_routine, arg, stack, t, flags);
	if (t2 == (pthread_t)-1) {
		if (flags & PTHREAD_START_CUSTOM) {
			// free the thread and stack if we allocated it
			_pthread_deallocate(t);
		}
		return EAGAIN;
	}
	if (t == NULL) {
		t = t2;
	}

	__pthread_add_thread(t, true, from_mach_thread);

Here, the crash happens at the following line of the inlined __pthread_add_thread():
		t->parentcheck = 1;

So 'rbx' is supposed to be 't', the newly created thread. rbx usually has a value like 0x700006e4d000 which looks like a plausable thread address.
Comment 1 Zoltan Varga 2016-12-20 22:29:35 UTC
What seems to happen is that mono_threads_add_joinable_thread () is called for the same tid twice, so we try to join the same tid twice. Its called from
sgen_client_thread_unregister (). It looks like there are two threadinfo structures for the same thread/tid.
Comment 2 Zoltan Varga 2016-12-21 00:40:19 UTC
Correction: this happens when tid's are reused, i.e. a thread is created with a given tid, dies, then a new thread is created with the same tid.
Comment 4 Zoltan Varga 2017-02-04 20:05:04 UTC
Created attachment 19724 [details]
c testcase
Comment 5 Zoltan Varga 2017-02-04 20:13:42 UTC
The attached c testcase can be used to reproduce this, its a reduced version of the testcase in comment #3. The testcase looks correct to me, so this looks like a OS/libpthread bug.

To reproduce:
clang -O2 -g crash.c
while true; do echo -n "."; ./a.out || break; done

This will crash after a while with the stacktrace above. Its reproducible on sierra (10.12.3), yosemite, and mavericks (10.9.5).
Comment 6 Rodrigo Kumpera 2017-02-06 19:24:57 UTC
Hi Zoltan,

Please work on this issue.
Comment 7 Zoltan Varga 2017-02-14 03:50:38 UTC
Reported as apple radar #30506046.
Comment 8 Ludovic Henry 2017-09-06 18:08:33 UTC
Fixed with 38ceab15479475d54159de8d2c0d297c56e5f80b