Bug 56322 - Running nunit tests with domain isolation crashes Mono
Summary: Running nunit tests with domain isolation crashes Mono
Status: VERIFIED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: Reflection ()
Version: 5.0 (2017-02)
Hardware: PC Mac OS
: --- major
Target Milestone: 15.2.2
Assignee: Aleksey Kliger
URL:
Depends on:
Blocks:
 
Reported: 2017-05-12 23:45 UTC by Aleksey Kliger
Modified: 2017-05-16 18:55 UTC (History)
4 users (show)

Tags:
Is this bug a regression?: Yes
Last known good build: mono-4.8.0-branch


Attachments
leaked coop handles (4.39 KB, text/plain)
2017-05-12 23:45 UTC, Aleksey Kliger
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
VERIFIED FIXED

Description Aleksey Kliger 2017-05-12 23:45:48 UTC
Created attachment 22153 [details]
leaked coop handles

Running NUnit tests with process isolation leads to a crash in the GC.

[19:15:04] 1>* Assertion: should not be reached at ./sgen-scan-object.h:90
[19:15:04] 1>
[19:15:04] 1>Stacktrace:
[19:15:04] 1>
[19:15:04] 1>
[19:15:04] 1>Native stacktrace:
[19:15:04] 1>
[19:15:04] 1>    0 mono 0x000000010bdd6561 mono_handle_native_crash + 277
[19:15:04] 1>    1 libsystem_platform.dylib 0x00007fff8968ff1a _sigtramp + 26
[19:15:04] 1>    2 ??? 0x0000000000000004 0x0 + 4
[19:15:04] 1>    3 libsystem_c.dylib 0x00007fff8b1729b3 abort + 129
[19:15:04] 1>    4 mono 0x000000010bf6a530 mono_log_write_logfile + 360
[19:15:04] 1>    5 mono 0x000000010bf7e5d8 monoeg_g_logv + 83
[19:15:04] 1>    6 mono 0x000000010bf7e77d monoeg_assertion_message + 143
[19:15:04] 1>    7 mono 0x000000010bf46ad0 drain_gray_stack + 8056
[19:15:04] 1>    8 mono 0x000000010bf3bdc3 finish_gray_stack + 117
[19:15:04] 1>    9 mono 0x000000010bf3c519 major_finish_collection + 125
[19:15:04] 1>    10 mono 0x000000010bf390c6 major_do_collection + 154
[19:15:04] 1>    11 mono 0x000000010bf38656 sgen_perform_collection + 687
[19:15:04] 1>    12 mono 0x000000010bf39a4d sgen_gc_collect + 50
[19:15:04] 1>    13 mono 0x000000010bef5162 unload_thread_main + 813
[19:15:04] 1>    14 mono 0x000000010bf756d6 inner_start_thread + 128
[19:15:04] 1>    15 libsystem_pthread.dylib 0x00007fff87a9b05a _pthread_body + 131
[19:15:04] 1>    16 libsystem_pthread.dylib 0x00007fff87a9afd7 _pthread_body + 0
[19:15:04] 1>    17 libsystem_pthread.dylib 0x00007fff87a983ed thread_start + 13

See attachment for state of the main threads coop handle stack.  There is a coop handle leak, because the main thread at the time is blocked on a wait, and we investigated and the objects in the coop handle stack belong to an unloaded domain.
Comment 1 Aleksey Kliger 2017-05-12 23:46:56 UTC
We have a PR against master that fixes the issue: https://github.com/mono/mono/pull/4852

Need to cherrypick to 2017-04 and 2017-02
Comment 2 Aleksey Kliger 2017-05-13 01:25:37 UTC
One way to reproduce the problem in a mono checkout:

1. Compile mono.  Ensure that the "corlib" and "System" tests are compiled, too:
   make -C mcs/class/corlib check
   make -C mcs/class/System check
2. Run the following:
  MONO_PATH="./mcs/class/lib/net_4_x:$MONO_PATH" runtime/mono-wrapper --debug ./mcs/class/lib/net_4_x/nunit-console.exe  -domain=Multiple mcs/class/System/net_4_x_System_test.dll mcs/class/corlib/net_4_x_corlib_test.dll -exclude=NotOnMac,MacNotWorking,NotWorking,ValueAdd,CAS,InetAccess -nothread

(Need to run more than one test assembly via nunit-console.exe and they should both be fairly large so that we leak a fair number of handles).

Expected output: some number of passed or failed tests.
Actual output: Mono asserts in sgen-scan-object.h and a crash.
Comment 3 Aleksey Kliger 2017-05-15 19:05:02 UTC
Fixed on mono master with commits https://github.com/mono/mono/commit/d321424cabda97947e66b02242f66881e27ab744 (and 7eafb61cf17393f67f847d7ad79182df0f6f6e61)

Fixed on mono 2017-04 with https://github.com/mono/mono/commit/82f5fb6d0bbc806724e51052766cbed0f75e7ad3 (and af2b7b62d0bfac790ae76efa13ce79b24e6a6052)

Fixed on mono 2017-02 with https://github.com/mono/mono/commit/25ac18a9b7176b6c5995113dbcc8afd880bfb633 (and d95d7d30fe89bd373af74cf08d7d2fff197b36c4)