Bug 3321 - Mono crashes when a large number of AppDomains is Created and Unloaded
Summary: Mono crashes when a large number of AppDomains is Created and Unloaded
Status: RESOLVED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: General ()
Version: unspecified
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2012-02-09 12:20 UTC by Alexandre Faria
Modified: 2013-04-22 16:49 UTC (History)
3 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Alexandre Faria 2012-02-09 12:20:47 UTC
Overview:
Mono crashes when a large number of AppDomains is created.
In the example, the crash occurs around 3100 AppDomains in regular mono and 2600 in mono-sgen.

But in real circumstances this happens a lot sooner, according to my tests about half the distance with light usage of the AppDomain on regular mono, the simpler the usage the latter this happened, I saw that as I simplified my test code into the final example.

I also tested delaying the consumption of AppDomains to see if it worked as a work around, but no luck.

I have reasons to believe that this affects long running apps that use AppDomains and the bigger the AppDomains, the sooner this will happen, so the idea that it only happens at 3k AppDomains is misleading and unfortunately for me wrong.

Steps to Reproduce:
Compile and run:
public class Example
{
    public static void Main()
    {
            for(int i=0; i<10000; i++)
            {
                System.Console.WriteLine("\n\nIteration " + i);

                AppDomain ad = AppDomain.CreateDomain("ChildDomain");

                AppDomain.Unload(ad);
            }
    }
}

Actual Results:
Iteration 3113
* Assertion at mini.c:3710, condition `code' not met

Stacktrace:

  at <unknown> <0xffffffff>
  at System.Runtime.Serialization.Formatters.Binary.ObjectReader.ReadObjectGraph (System.Runtime.Serialization.Formatters.Binary.BinaryElement,System.IO.BinaryReader,bool,object&,System.Runtime.Remoting.Messaging.Header[]&) <0x000ed>
  at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.NoCheckDeserialize (System.IO.Stream,System.Runtime.Remoting.Messaging.HeaderHandler) <0x0017d>
  at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Deserialize (System.IO.Stream) <0x0001b>
  at System.Runtime.Remoting.Channels.CADSerializer.DeserializeObject (System.IO.MemoryStream) <0x0006e>
  at System.Runtime.Remoting.Messaging.CADMethodCallMessage.GetArguments () <0x0006e>
  at System.Runtime.Remoting.Messaging.MethodCall..ctor (System.Runtime.Remoting.Messaging.CADMethodCallMessage) <0x00044>
  at System.AppDomain.ProcessMessageInDomain (byte[],System.Runtime.Remoting.Messaging.CADMethodCallMessage,byte[]&,System.Runtime.Remoting.Messaging.CADMethodReturnMessage&) <0x000a5>
  at (wrapper remoting-invoke-with-check) System.AppDomain.ProcessMessageInDomain (byte[],System.Runtime.Remoting.Messaging.CADMethodCallMessage,byte[]&,System.Runtime.Remoting.Messaging.CADMethodReturnMessage&) <0xffffffff>
  at System.Runtime.Remoting.Channels.CrossAppDomainSink.ProcessMessageInDomain (byte[],System.Runtime.Remoting.Messaging.CADMethodCallMessage) <0x0006a>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_CrossAppDomainSink/ProcessMessageRes_object_object (object,intptr,intptr,intptr) <0xffffffff>
  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Reflection.MonoMethod.InternalInvoke (System.Reflection.MonoMethod,object,object[],System.Exception&) <0xffffffff>
  at System.AppDomain.InvokeInDomainByID (int,System.Reflection.MethodInfo,object,object[]) <0x0009c>
  at System.Runtime.Remoting.Channels.CrossAppDomainSink.SyncProcessMessage (System.Runtime.Remoting.Messaging.IMessage) <0x00109>
  at System.Runtime.Remoting.Proxies.RemotingProxy.Invoke (System.Runtime.Remoting.Messaging.IMessage) <0x00350>
  at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke (System.Runtime.Remoting.Proxies.RealProxy,System.Runtime.Remoting.Messaging.IMessage,System.Exception&,object[]&) <0x003fa>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_object_object_object_Exception&_object[]& (object,intptr,intptr,intptr) <0xffffffff>
  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) object.__icall_wrapper_mono_store_remote_field_new (object,intptr,intptr,object) <0xffffffff>
  at (wrapper stfld-remote) object.__mono_store_remote_field_new_wrapper (object,intptr,intptr,object) <0xffffffff>
  at System.AppDomain.CreateDomain (string,System.Security.Policy.Evidence,System.AppDomainSetup) <0x002ff>
  at System.AppDomain.CreateDomain (string) <0x00010>
  at Example.Main () <0x00068>
  at (wrapper runtime-invoke) object.runtime_invoke_void (object,intptr,intptr,intptr) <0xffffffff>

Native stacktrace:

	mono() [0x4988f2]
	/lib/x86_64-linux-gnu/libpthread.so.0(+0x10060) [0x7f4e0c789060]
	/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f4e0c4103a5]
	/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f4e0c413b0b]
	mono() [0x5e248b]
	mono() [0x5e25c6]
	mono() [0x41de5f]
	mono() [0x41e63c]
	mono() [0x41fd42]
	mono() [0x42071d]
	mono() [0x49a571]
	[0x40da816a]

Debug info from gdb:

Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operação não permitida.
No threads.

=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

Abortado

Build Date & Platform:
Mono JIT compiler version 2.11 (master/91d40d7 Qui Fev  9 15:26:58 WET 2012)
Comment 1 Alexandre Faria 2013-02-28 08:05:15 UTC
This problem is getting worse, as even now on boehm its happening a lot sooner, around 2600, as it was with sgen.

Is there anything that can be done to help?

Is there a workaround for this?
Comment 2 Alexandre Faria 2013-03-31 18:22:17 UTC
Actually its not required to unload the AppDomain, just creating has the same effect.

For some reason the hard limit sometimes is lower, it was around 3000+, then 2500+ and now its back to 3000+.
Comment 3 Alexandre Faria 2013-04-18 20:53:12 UTC
This happens both with __thread and pthread on ubuntu.

But on windows with mono 3.0.9 it seems to work fine, I haven't found a limit yet.

Is there anything else I can do to help solve this one?
Comment 4 Zoltan Varga 2013-04-21 15:37:11 UTC
Hopefully fixed by:
https://github.com/mono/mono/commit/839539518c868022e24f3183b41891157e65c5cf
Comment 5 Alexandre Faria 2013-04-21 19:00:54 UTC
I can confirm, this test case is now working properly, but others that I assumed that were caused by the same aren't.

I'll be posting some of them as soon as I'm able to reproduce them with simple test cases.

Thanks!!! Great work!!!
Comment 6 Zoltan Varga 2013-04-21 19:13:30 UTC
You can try increasing the VALLOC_FREELIST_SIZE at
mono/mono/utils/mono-codeman.c:206
to something like 128 to see if that makes any difference.
Comment 7 Alexandre Faria 2013-04-22 16:27:01 UTC
I can confirm a vast improvement in the real use case, it no longer crashed on the same place. So this test case was well targeted.

But after some 8 hours it crashed, I'm not sure how many app domains are created and unloaded but some thousands for sure, nor if it has something to do with the current one.

I'm going to try to isolate this one, but it won't be easy as it takes about 8h to get there. But if I'm able to isolate this one, I'll report it.


There were some other that I stumbled upon while trying Akka using IKVM, that Jeroen Frijters confirmed that were mono's, but also linux specific. Regarding these ones I can submit them if you want.


Just for curiosity, the crash that I got on the real use case after about 8h, while it was occupying just something like 6GB of RAM of a total of 12GB plus 20GB of swap (that weren't mostly used).
I know it doesn't help, but it shows why I wasn't easy to spot that they had different causes as they seem related, even if they aren't.

Mono was compiled with --with-large-heap=yes.


The error that I have is this one:

mmap(...PROT_NONE...) failed
Stacktrace:


Native stacktrace:

	mono() [0x4a9cd1]
	/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f49fa151cb0]
	/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f49f9db9425]
	/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f49f9dbcb8b]
	mono() [0x60259b]
	mono() [0x603a39]
	mono() [0x609cfc]
	mono() [0x5f85af]
	mono() [0x5f7c4f]
	mono() [0x5f8658]
	mono() [0x5f86a0]
	mono(mono_domain_finalize+0x94) [0x59d6b4]
	mono() [0x59477d]
	mono() [0x5e8061]
	mono() [0x5f6fb9]
	mono() [0x6084f0]
	/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f49fa149e9a]
	/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f49f9e76cbd]

Debug info from gdb:

Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operação não permitida.
No threads.

=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================
Comment 8 Zoltan Varga 2013-04-22 16:33:21 UTC
That one is:
https://bugzilla.xamarin.com/show_bug.cgi?id=6216
Comment 9 Alexandre Faria 2013-04-22 16:49:35 UTC
Lets hope so, as to isolate a bug that takes 8h to surface isn't easy.
I will keep an eye on that one.

Thank you!!! Kudos!!!