Notice (2018-05-24): bugzilla.xamarin.com is now in
Please join us on
Visual Studio Developer Community and in the
Mono organizations on
GitHub to continue tracking issues. Bugzilla will remain
available for reference in read-only mode. We will continue to work
on open Bugzilla bugs, copy them to the new locations
as needed for follow-up, and add the new items under Related
Our sincere thanks to everyone who has contributed on this bug
tracker over the years. Thanks also for your understanding as we
make these adjustments and improvements for the future.
Please create a new report on
GitHub or Developer Community with
your current version information, steps to reproduce, and relevant error
messages or log files if you are hitting an issue that looks similar to
this resolved bug and you do not yet see a matching new report.
There are issues with the monitor thread and new heuristics code introduced in Mono 3.12 in this commit https://github.com/mono/mono/commit/c93f12021147713216080e815267d6f7c857c732. We are currently experiencing a random crash, EXC_BAD_ACCESS, when evaluating a thread's state in the monitor_heuristic method at https://github.com/mono/mono/blob/master/mono/metadata/threadpool.c#L846. When the crash occurs, threads->len > tp->nthreads, which does not look like it should ever be the case but the difference between these two values increases the longer the program runs.
I have tracked this down to when the monitor thread creates a new threadpool thread (https://github.com/mono/mono/blob/master/mono/metadata/threadpool.c#L958).
1: Monitor thread -> create new thread (https://github.com/mono/mono/blob/master/mono/metadata/threadpool.c#L1198)
2: New thread -> there is no work to be done, so thread removes itself from the threads array (which it hasn't been added to yet) https://github.com/mono/mono/blob/master/mono/metadata/threadpool.c#L1726-L1729
3: Monitor thread -> adds the thread (which has now exited so the pointer is invalid) to the threads array https://github.com/mono/mono/blob/master/mono/metadata/threadpool.c#L1202
The most obvious case is when the application crashes because we no longer "own" the memory pointed to by the garbage pointer; otherwise it is evaluating junk data.
The new heuristics code also appears to create and immediately destroy a lot of threads which occurs when step 3 happens before step 2.
This issue should be fixed by https://github.com/mono/mono/commit/2c524916b74f43bc7b64e64dfbec5e1a2b90af85
While I agree that your commit should prevent the crash caused by adding a thread that is no longer around, I believe it fails to fix the underlying issue noted at the bottom of my initial report. The new code still creates threads just to have them be immediately destroyed which is the underlying cause of the reported crash.
I agree with Martin,
We have found some workloads where this fast create/destroy cycle is quite troubling.
We have just attempted to upgrade our Mono 3.10.0 to 3.12.0 and subsequently observed regressions.
The setup is :
Debian 3.2.63-2+deb7u1 x86_64 GNU/Linux
The regressions seemed to be around our thread-based code. Using git bisect we were able to deduce the probable guilty commit :
[mono] ~/mono @ git bisect good
c93f12021147713216080e815267d6f7c857c732 is the first bad commit
Author: Ludovic Henry <email@example.com>
Date: Tue Jul 22 12:16:45 2014 -0400
[threadpool] Add heuristic to approach the optimal number of threads
:040000 040000 3786dcb292de059604d8638e5ffc093b57cd2ad0 c122e302b25fe6bda3fd889038ca41f68f680b42 M mono
I suspect it is related to this issue. Next week I will try to post some more helpful details about the regression.
Is the above a fix for the following?
It looks like the same crash (SIGSEGV) we were getting due to invalid threads being in the list. The commit mentioned above should prevent that crash, but there is still a high churn rate in the number of threads with the new heuristics.
This crash is fixed by 2c524916b74f43bc7b64e64dfbec5e1a2b90af85. It has not be applied to 3.12, but I am currently working on getting it backported.
It does not fix the creation+suppression of useless threads though. This will be fixed with the importation of the Microsoft ThreadPool from reference source.
*** Bug 28062 has been marked as a duplicate of this bug. ***
Brilliant, I was looking to see if it was backported so I'll keep an eye it it. For now I might just compile a 3.12 hotfix with this change.
Just for reference, I compiled 3.12.1 with the 2c524916b74f43bc7b64e64dfbec5e1a2b90af85 changeset and deployed it to 8 production machines on Monday. None of them have crashed yet. #winning