Bug 18398 - FromAsync on NetworkStreams leaks memory and leads to eventual OOM crash.
Summary: FromAsync on NetworkStreams leaks memory and leads to eventual OOM crash.
Status: RESOLVED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: GC ()
Version: 3.2.x
Hardware: Other Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
: 18810 ()
Depends on:
Blocks:
 
Reported: 2014-03-14 10:03 UTC by Brandon White
Modified: 2014-04-10 14:45 UTC (History)
4 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
client (994 bytes, application/octet-stream)
2014-03-15 20:12 UTC, Zoltan Varga
Details
server.cs (2.45 KB, application/octet-stream)
2014-03-15 20:12 UTC, Zoltan Varga
Details
Client/server test application (3.77 KB, application/x-zip-compressed)
2014-03-23 21:02 UTC, Brandon White
Details
TaskFactoryTest: Even simpler demo app using Task.Factory (3.24 KB, application/x-zip-compressed)
2014-04-01 18:46 UTC, Brandon White
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Brandon White 2014-03-14 10:03:58 UTC
My mono application on Debian/ARM is crashing due to OOM condition. I have tracked the cause to the usage of FromAsync on NetworkStreams and HttpListenerContexts.  The use of these constructs in Katana Microsoft.Owin.Host.HttpListener.dll and SignalR Microsoft.AspNet.SignalR.Core.dll that I use in my product are what led me down this path.

- I have distilled the scenario down to a simple client & server app using TCP sockets.  See snippets below.  I use export MONO_GC_PARAMS=max-heap-size=16M to force the crash sooner.
- Using FromAsync on simple non-network related IAsyncResult’s, such as a Func<> that was reported in #12236, does *not* exhibit the leak. 
- Avoiding the use of FromAsync and Task.Continue and instead using the legacy IAsyncResult pattern directly does *not* exhibit the leak.  This is my current short-term workaround.
- I am currently targeting .NET 4.0.
- This occurs using mono 3.0.6 on Debian with either Boehm or sgen.
- I can also reproduce the issue in 3.4.0 (master/e01250c) using sgen.
- I tried this on another platform running Ubuntu 12.04.3 amd64 with 3.2.7 mono and it does *not* crash.  What I observed though is that memory usage increases rapidly to the heap limit and then holds at that amount.  This may be a concern.

Here is the platform info on the embedded SOM we are using (http://wiki.embeddedarm.com/wiki/TS-4712)
uname -a
Linux ts4700 2.6.34-ts471x #88 PREEMPT Thu Aug 15 10:29:09 MST 2013 armv5tejl GNU/Linux

This is the current mono version we are targeting (from Debian):
 mono -V
Mono JIT compiler version 3.0.6 (Debian 3.0.6+dfsg2-10)
Copyright (C) 2002-2012 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
        TLS:           __thread
        SIGSEGV:       normal
        Notifications: epoll
        Architecture:  armel,soft-float
        Disabled:      none
        Misc:          softdebug
        LLVM:          supported, not enabled.
        GC:            Included Boehm (with typed GC and Parallel Mark)


We have also tested with mainline mono and the issue persists:
Mono JIT compiler version 3.4.0 (master/e01250c Fri Mar  7 22:18:40 UTC 2014)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
        TLS:           __thread
        SIGSEGV:       normal
        Notifications: epoll
        Architecture:  armel,vfp+fallback
        Disabled:      none
        Misc:          softdebug
        LLVM:          supported, not enabled.
        GC:            sgen
Comment 1 Brandon White 2014-03-14 10:11:42 UTC
Server test snippet. This is what leaks.

        static void Main(string[] args)
        {
            var tcpServer = new TcpListener(IPAddress.Loopback, 8085);
            tcpServer.Start();

            BeginAcceptTcpClient(tcpServer);
        }

        static void BeginAcceptTcpClient(TcpListener tcpServer)
        {
            Task.Factory.FromAsync<TcpClient>(tcpServer.BeginAcceptTcpClient, tcpServer.EndAcceptTcpClient, null)
                .ContinueWith(task =>
                {
                    BeginAcceptTcpClient(tcpServer);
                    BeginTcpClientRead(task.Result.GetStream(), new byte[8192]);
                });
        }

        // To observe the OutOfMemory condition, limit the the mono heap to something small, such as
        // export MONO_GC_PARAMS=max-heap-size=16M
        // 
        static void BeginTcpClientRead(NetworkStream stream, byte[] buffer)
        {
            // This leaks!
            //
            Task<int>.Factory.FromAsync(stream.BeginRead, stream.EndRead, buffer, 0, buffer.Length, null)
                .ContinueWith(task =>
                {
                    int iBytesRead = task.Result;
                    if (iBytesRead == 0) return;

                    OnDataAvailable(buffer, iBytesRead);

                    // Add this in... still leaks!
                    // (You can also try doing a GC.Collect(10, GCCollectionMode.Forced) 
                    // or similar periodically.  Still leaks!)
                    //
                    //task.Dispose();
                    //task = null;

                    BeginTcpClientRead(stream, buffer);
                });
            
            // This doesn't leak
            //
            //stream.BeginRead(buffer, 0, buffer.Length, (ar) => OnDataAvailable(ar, stream, buffer), null);
        }

        static void OnDataAvailable(byte[] buffer, int iBytes)
        {
            string sRequest = Encoding.UTF8.GetString(buffer, 0, iBytes);
            Console.WriteLine("Received " + iBytes + " bytes: " + sRequest);
        }

        // Alternate OnDataAvailable() to observe that direct IAsyncResult
        // use does not leak.
        //
        static void OnDataAvailable(IAsyncResult ar, NetworkStream stream, byte[] buffer)
        {
            int iBytes = stream.EndRead(ar);
            OnDataAvailable(buffer, iBytes);
            BeginTcpClientRead(stream, buffer);
        }
Comment 2 Brandon White 2014-03-14 10:12:57 UTC
Client test snippet:

            var tcpClient = new TcpClient("localhost", 8085);
            var tcpStream = tcpClient.GetStream();
            string sBuffer = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec in iaculis augue.";            

            while (true)
            {
                sBuffer = sBuffer[sBuffer.Length - 1] + sBuffer.Substring(0, sBuffer.Length - 1);
                byte[] content = utfEncoder.GetBytes(sBuffer);

                Console.Write("Sending " + sBuffer + "...");
                tcpStream.Write(content, 0, content.Length);
                Console.WriteLine("OK");
                
                // Throttle back just to demonstrate that the server will still leak and crash.
                // (Take this out to see it run out of memory faster)
                //
                Thread.Sleep(100);
            }
Comment 3 Brandon White 2014-03-14 13:22:56 UTC
Here are some profiler results of interest.  The 623xx counts appear to directly correlate to the number of test iterations.

     Bytes      Count  Average Type name
  27871096     190745      146 System.String
  26089888     125026      208 System.Char[]
  13476456     187173       72 System.Runtime.Remoting.Messaging.AsyncResult
   6987792      62391      112 System.Net.Sockets.Socket.SocketAsyncResult
   4991680      62396       80 System.Threading.Tasks.Task
   4990880      62386       80 System.Threading.Tasks.Task<System.Int32>
   3493840      62390       56 System.AsyncCallback
   3493616      62386       56 System.Func<System.Byte[],System.Int32,System.Int32,System.AsyncCallback,System.Object,System.IAsyncResult>
   3493616      62386       56 System.Func<System.IAsyncResult,System.Int32>
   3493616      62386       56 System.Action<System.Threading.Tasks.Task<System.Int32>>
   2284088      63696       35 System.Object[]
   2005240      62506       32 System.String[]
   1497384      62391       24 System.Net.Sockets.Socket.Worker
   1497264      62386       24 System.Threading.Tasks.TaskFactory.<FromAsyncBeginEnd>c__AnonStorey37<System.Int32,System.Byte[],System.Int32,System.Int32>
    999248      62453       16 System.Int32
    998288      62393       16 System.Threading.Tasks.TaskContinuation
    998224      62389       16 unresolved class 0x631698
    998176      62386       16 SignalRTestServer.Program.<>c__DisplayClass7
    998176      62386       16 System.Threading.Tasks.TaskCompletionSource<System.Int32>
    998176      62386       16 System.Threading.Tasks.TaskActionInvoker.ActionTaskInvoke<System.Int32>
    109280       1065      102 System.Byte[]
     23616        273       86 System.Int32[]
     18640       1165       16 System.MonoType
     17424         33      528 System.Xml.NameTable.Entry[]
     17120        128      133 System.Collections.Hashtable.Slot[]
     15264         74      206 System.Collections.Generic.Link[]
Comment 4 Zoltan Varga 2014-03-15 20:11:12 UTC
I can reproduce this with 3.2.7 on osx.
Comment 5 Zoltan Varga 2014-03-15 20:12:00 UTC
Created attachment 6329 [details]
client
Comment 6 Zoltan Varga 2014-03-15 20:12:46 UTC
Created attachment 6330 [details]
server.cs
Comment 7 Zoltan Varga 2014-03-15 20:17:22 UTC
Could you attach a complete server app which reproduces the problem ?
Comment 8 Zoltan Varga 2014-03-15 20:21:58 UTC
Ignore the server.cs attachment, it rans out of memory because of a different reason.
Comment 9 Zoltan Varga 2014-03-21 17:41:39 UTC
I can't reproduce this on an arm box using master. Can you attach the testcase client/server you used ?
Comment 10 Brandon White 2014-03-23 21:02:19 UTC
Created attachment 6394 [details]
Client/server test application
Comment 11 Brandon White 2014-03-23 21:06:32 UTC
Thank you very much for looking into this issue and trying to reproduce it.  I've attached a simple project that exhibits the issue on my target system.  Run the app twice on the test system.  The first instance will open port 8085 as the server, the second instance will open the same port as the test client after seeing that the port is already in use by the server.

Could you please post the `uname -a` for the ARM system(s) you are testing on?
Comment 12 Brandon White 2014-04-01 18:44:39 UTC
It turns out the problem is more fundamental for me on this system.  I eliminated the sockets stuff from the scenario.  I'm attaching a project named "TaskFactoryTest" that is even simpler and leaks memory leading to OOM.  

I could really use some help with this.  My client is trying to run a fairly simple C# app on this industrial Linux/ARM system, but every few days the mono process is crashing due to OOM from this problem.

I'm very willing to dig deeper into mono to try and root-cause this, particularly the GC. I'd appreciate some guidance though.

----------------------------------------------------------------------

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Threading;

namespace TaskFactoryTest
{
    /// <summary>
    /// To observe the OutOfMemory condition faster, limit the the mono heap to something small, such as
    /// export MONO_GC_PARAMS=max-heap-size=16M.  Also you should adjust the number of entries in memoryPressure
    /// to fill up the heap.
    /// </summary>
    class Program
    {
        static void Main(string[] args)
        {
            List<byte[]> memoryPressure = Enumerable.Range(0, 5000).Select(i => new byte[1024]).ToList();
            StartTaskFactoryTest();
            Task.Factory.StartNew(GcLoop);
            Console.ReadLine();
        }

        static void GcLoop()
        {
            while (true)
            {
                GC.Collect(10, GCCollectionMode.Forced);
                GC.WaitForPendingFinalizers();
                System.Threading.Thread.Sleep(3000);
            }
        }

        static int s_iteration;

        static void StartTaskFactoryTest()
        {
            Task.Factory.StartNew<int>(DummyWorker).ContinueWith(task => StartTaskFactoryTest());
        }

        static int DummyWorker()
        {
            Thread.Sleep(20);
            Console.WriteLine(s_iteration);
            return s_iteration++;
        }
    }
}
Comment 13 Brandon White 2014-04-01 18:46:25 UTC
Created attachment 6450 [details]
TaskFactoryTest: Even simpler demo app using Task.Factory
Comment 14 Zoltan Varga 2014-04-01 18:59:16 UTC
When running this with:

MONO_GC_PARAMS=max-heap-size=16M ./mono 18398.exe 

I get a crash at:

Thread 2 (Thread 0x422ff430 (LWP 12232)):
#0  0x40174384 in __libc_do_syscall () from /lib/arm-linux-gnueabi/libpthread.so.0
#1  0x40173336 in waitpid () from /lib/arm-linux-gnueabi/libpthread.so.0
#2  0x000e4a94 in mono_handle_native_sigsegv (signal=11, ctx=0x422fe3e0) at mini-exceptions.c:2305
#3  0x00024df4 in mono_sigsegv_signal_handler (_dummy=11, info=0x422fe360, context=0x422fe3e0) at mini.c:6851
#4  <signal handler called>
#5  0x00225ae4 in sgen_pin_queue_clear_discarded_entries (section=0x40090010, max_pin_slot=162) at sgen-pinning.c:138
#6  0x00205d12 in major_finish_collection (reason=0x33e878 "user request", old_next_pin_slot=162, scan_mod_union=0) at sgen-gc.c:3187
#7  0x0020619c in major_do_collection (reason=0x33e878 "user request") at sgen-gc.c:3312
#8  0x002066a8 in sgen_perform_collection (requested_size=0, generation_to_collect=1, reason=0x33e878 "user request", wait_to_finish=1) at sgen-gc.c:3506
#9  0x00208916 in mono_gc_collect (generation=1) at sgen-gc.c:4609
#10 0x001d49ae in ves_icall_System_GC_InternalCollect (generation=10) at gc.c:411
#11 0x40473a14 in ?? ()
#12 0x40473a14 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Comment 15 Brandon White 2014-04-01 19:06:55 UTC
@Zoltan, are you getting some test iteration output first (like 100s if not 1000) before it craps out?  Or are you having trouble even running this?

My apologies, but I'm unclear on whether you believe my test app is incorrect or if what you observed is possibly indicating a mono bug.

Whichever the case, thanks for the fast response :)
Comment 16 Zoltan Varga 2014-04-01 19:16:57 UTC
The runtime probably shouldn't crash like that, so this is a mono bug,  Also, the test app seems to stop writing output after a random number of iterations.
Comment 17 Brandon White 2014-04-01 21:59:08 UTC
Clue: If you eliminate the ContinueWith, memory use is very stable, there is no leaking, and the app doesn't crash.  I've confirmed this on armel, armhf, and x64 (all with sgen).

------------------------------------------------------------------

        static void StartTaskFactoryTest()
        {
            while (true)
            {                
                Task.Factory.StartNew(DummyWorker);
                Thread.Sleep(20);
            }
        }

        static void DummyWorker()
        {
            Console.WriteLine(s_iteration);
            s_iteration++;
        }
Comment 18 Brandon White 2014-04-02 10:09:16 UTC
More clues:

This leaks. It eliminates use of ContinueWith, but is still starting the next task from within the prior task.
------------------------------------------------------------------
        static void StartTaskFactoryTest()
        {
            Task.Factory.StartNew(DummyWorker);
        }

        static void DummyWorker()
        {
            Console.WriteLine(s_iteration);
            s_iteration++;
            Thread.Sleep(20);

            StartTaskFactoryTest();
        }
------------------------------------------------------------------


This does NOT leak.  It exhibits very stable memory use.  Note the hack to avoid starting the next task from within the prior one.
------------------------------------------------------------------
        static AutoResetEvent taskFinishedEvent = new AutoResetEvent(true);

        static void TaskStarterLoop()
        {
            while (true)
            {
                taskFinishedEvent.WaitOne();
                StartTaskFactoryTest();
            }
        }

        static void StartTaskFactoryTest()
        {
            Task.Factory.StartNew(DummyWorker);
        }

        static void DummyWorker()
        {
            Console.WriteLine(s_iteration);
            s_iteration++;
            Thread.Sleep(20);

            taskFinishedEvent.Set();
        }
------------------------------------------------------------------

I believe there is enough evidence now to warrant the following statement:  If you recursively restart a new task from within the prior task, memory is leaked.
Comment 19 Brandon White 2014-04-02 10:36:54 UTC
Here is some distilled data from HeapShot, between 2 shots while the test is in a steady state.  Only the objects with significant growth between shots are shown:

------------------------------------------------------------------

Heap shot 2 at 13.075 secs: size: 6727424, object count: 7588, class count: 97, roots: 1008

	     53592        609       88 System.Threading.Tasks.Task (bytes: +25960, count: +295)
		8 root references (5 pinning)
		607 references from: System.Threading.Tasks.Task		
		2 references from: System.Runtime.Remoting.Messaging.AsyncResult
		
	     34104        609       56 System.Action (bytes: +16520, count: +295)
		4 root references (4 pinning)
		609 references from: System.Threading.Tasks.TaskActionInvoker.ActionInvoke

	      9744        609       16 System.Threading.Tasks.TaskActionInvoker.ActionInvoke (bytes: +4720, count: +295)
		3 root references (3 pinning)
		609 references from: System.Threading.Tasks.Task

		
Heap shot 3 at 19.160 secs: size: 6774856, object count: 8477, class count: 97, roots: 1008

	     79640        905       88 System.Threading.Tasks.Task (bytes: +26048, count: +296)
		8 root references (5 pinning)
		903 references from: System.Threading.Tasks.Task
		3 references from: System.Runtime.Remoting.Messaging.AsyncResult
		
	     50680        905       56 System.Action (bytes: +16576, count: +296)
		3 root references (3 pinning)
		905 references from: System.Threading.Tasks.TaskActionInvoker.ActionInvoke
		
	     14480        905       16 System.Threading.Tasks.TaskActionInvoker.ActionInvoke (bytes: +4736, count: +296)
		3 root references (3 pinning)
		905 references from: System.Threading.Tasks.Task
------------------------------------------------------------------

So it's clear that the GC is not able to clean up the Task objects.
I speculate that perhaps what is happening is that a sort of linked list of Tasks is being created, and something is preventing the GC from determining that the head Task is collectable.  I'll keep digging, but if anyone has any insight please chime in!
Comment 20 Brandon White 2014-04-02 16:58:12 UTC
I finally got the Heap Shot GUI tool compiling and running.  It confirms the "linked list of Tasks" theory.  What I am seeing is that each subsequent task is referenced by the prior task.
Comment 21 Brandon White 2014-04-02 21:25:18 UTC
Hurray!  I finally got to the root of this and have a fix submitted in https://github.com/mono/mono/pull/979 .

Please review my code, and if there are any potential regression issues I haven't considered please do let me know.  I'm looking forward to getting a commit into Mono :)
Comment 22 Brandon White 2014-04-08 11:01:36 UTC
This is now fixed in master as of https://github.com/mono/mono/commit/5e5f3a19fd97f2f55b5054fa64fef905308cdc80.
Comment 23 Marek Safar 2014-04-10 14:45:52 UTC
*** Bug 18810 has been marked as a duplicate of this bug. ***