Bug 13951 - Threads stuck in suspend state with sgen on stack
Summary: Threads stuck in suspend state with sgen on stack
Status: RESOLVED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: GC ()
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
: 13933 17605 ()
Depends on:
Blocks:
 
Reported: 2013-08-13 09:27 UTC by Simon Lindgren
Modified: 2014-02-12 16:43 UTC (History)
7 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Output when running "thread apply all bt" in gdb. (110.32 KB, text/plain)
2013-08-13 09:27 UTC, Simon Lindgren
Details
Output when running "thread apply all bt" in gdb. Fewer threads this time. (78.85 KB, text/plain)
2013-08-13 12:11 UTC, Simon Lindgren
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Simon Lindgren 2013-08-13 09:27:16 UTC
Created attachment 4617 [details]
Output when running "thread apply all bt" in gdb.

I have bisected this issue to being introduced between ef805b5^^..ef805b5.

Steps to reproduce:
1. Use relatively recent mono master (after the switch to sgen as default)
2. Use a relatively recent MonoDevelop master (needs CodeIssue batch runner to reproduce easily. At least 573246a1d3)
3. Open the Code Issue pad (View > Pads > Code Issues)
4. Click Run.
5. MD hangs, usually within a couple of seconds if not sooner.

At this point, kill -quit $MD_PID does not print a stack trace, only a listing of threads (of which most are interrupted). The soft debugger cannot debug this probably because of that.

More notes:
I'm running this on 64bit Fedora 18.

Current MD master behaves poorly wrt thread usage when batch running CodeIssues. It creates lots and lots of threads. Environment.ProcessorCount ^ 2 threads at a time, with a total of FileCount * Environment.ProcessorCount + Environment.ProcessorCount threads per run. I have a local fix for that which only creates as many threads as there are cpu cores total and at a time and with that the problem is much harder to reproduce. Might be a clue to what is going on, what do I know ;)
Comment 1 Zoltan Varga 2013-08-13 11:15:22 UTC
Try setting the MONO_ENV_OPTIONS env variable to '-O=-aot' as a workaround.
Comment 2 Simon Lindgren 2013-08-13 12:00:56 UTC
-O=-aot seems to work as a workaround.
Comment 3 Simon Lindgren 2013-08-13 12:10:42 UTC
Correction:
My local fix to create a smaller number of threads does not seem to have a significant effect on these hangs after all. Perhaps because I ran MD in the (soft) debugger. The soft debugger changes the behavior in that mono might crash before it hangs, or it can still hang, but the process usually survives longer before something happens.
Comment 4 Simon Lindgren 2013-08-13 12:11:50 UTC
Created attachment 4622 [details]
Output when running "thread apply all bt" in gdb. Fewer threads this time.
Comment 5 Zoltan Varga 2013-08-13 18:15:47 UTC
Added a fix/workaround in a31b580fdcbaa9a8a16d59ffb12d04f5872f54e8. Could you try it out ?
Comment 6 Simon Lindgren 2013-08-13 19:23:55 UTC
I cannot reproduce the hang with that change. Yay!
Comment 7 Zoltan Varga 2013-08-13 20:18:45 UTC
-> FIXED.
Comment 8 Nikita Tsukanov 2013-08-14 10:03:31 UTC
*** Bug 13933 has been marked as a duplicate of this bug. ***
Comment 9 Andres G. Aragoneses 2013-08-14 10:28:16 UTC
Can this be backported to 3-2?
Comment 10 Miguel de Icaza [MSFT] 2013-08-19 09:09:00 UTC
Andres,

The next Mono 3.2 will come out of master.

We are done with maintaining branches for 3 years :-)
Comment 11 Andres G. Aragoneses 2013-08-19 09:50:39 UTC
Oh, that's not what someone (I think it was Rodrigo) said in #monodev recently I think. Anyway, Zoltan actually backported it shortly after I asked :) ->  https://github.com/mono/mono/commit/fe1fe75fbdd79537a2915b5ebf9262df7a0645f6

Thanks
Comment 12 Mark Probst 2014-02-12 16:43:04 UTC
*** Bug 17605 has been marked as a duplicate of this bug. ***