Bug 2965 - Socket.SendToAsync loses operations when called too quickly, followed by runtime crash
Summary: Socket.SendToAsync loses operations when called too quickly, followed by runt...
Alias: None
Product: Class Libraries
Classification: Mono
Component: System ()
Version: 2.10.x
Hardware: PC Linux
: High normal
Target Milestone: Untriaged
Assignee: Paolo Molaro
Depends on:
Reported: 2012-01-19 06:24 UTC by Tammo Hinrichs
Modified: 2012-08-02 14:06 UTC (History)
7 users (show)

Is this bug a regression?: ---
Last known good build:

Repro case for SendToAsync bug (2.52 KB, text/plain)
2012-01-19 06:24 UTC, Tammo Hinrichs

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.

Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:

Description Tammo Hinrichs 2012-01-19 06:24:05 UTC
Created attachment 1231 [details]
Repro case for SendToAsync bug

When two Socket.SendToAsync() calls are called in rapid succession, the runtime seems to get into a state where it "loses" all asynchronous operations - the callbacks don't get called, and on cleanup there's a crash inside the socket library.

Steps to Reproduce:
1. mcs AsyncCrash.cs (attached)
2. mono AsyncCrash.exe

Test rig was a dual-Xeon (8 cores) machine running Debian x64, repro rate is a solid 100%.

Output of the non-working test case (send two UDP packets directly after each other, sleep for one second, then send a third packet):
/workspace/tammohinrichs$ mono-2.10.8/bin/mono SendAsyncTest.exe
42866.428: send 0
42866.447: send 1
42866.447: waiting 1 sec...
42866.450: cb (async) for 0: Success
42867.447: send 2
42867.447: waiting 5 secs...
42872.447: done.
press Enter

Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object
  at System.Net.Sockets.Socket+SocketAsyncResult.Complete () [0x00000] in <filename unknown>:0
  at System.Net.Sockets.Socket+SocketAsyncResult.Complete (System.Exception e) [0x00000] in <filename unknown>:0
  at System.Net.Sockets.Socket+Worker.Accept () [0x00000] in <filename unknown>:0
  at System.Net.Sockets.Socket+Worker.DispatcherCB (System.Net.Sockets.SocketAsyncResult sar) [0x00000] in <filename unknown>:0

Please note that the crash occurred before I actually pressed Enter, probably when the GC disposed of the Test instance.

Running the test with Mono 2.8 (or with .NET in Windows) yields the expected behaviour:
/workspace/tammohinrichs$ mono-2.8/bin/mono SendAsyncTest.exe
42851.814: send 0
42851.816: send 1
42851.816: waiting 1 sec...
42851.817: cb (async) for 1: Success
42851.817: cb (async) for 0: Success
42852.816: send 2
42852.817: waiting 5 secs...
42852.817: cb (async) for 2: Success
42857.817: done.
press Enter

Also, inserting a Thread.Sleep() before the "send 1" makes the test case work. The problem occurs exactly when you fire off a SendToAsync() before the callback of the last one has been called.
Comment 1 Tammo Hinrichs 2012-01-27 08:04:03 UTC
I think I found the bug.

(Disclaimer: I'm not too experienced with Mono or Linux or the usual open source dev tools, so I apologize for the non-standard form of this report and the chance that maybe I got it all wrong. But one thing I've learned is that when you look at a piece of code and it obviously makes no sense, you should trust your instincts. :)

As far as I understand it: In the implementation of SendToAsync a worker and a result structure is set up and then put into a queue (probably to avoid out of order processing of IO operations). If the queue was empty before, the worker is getting kicked off. Otherwise there's a piece of code at the end of the dispatcher which schedules the next job from the queue. So far, so good.

Then again SendToAsync looks like this right after the worker/result setup:

mcs/class/System/System.Net.Sockets/Socket.cs, line 1964

Worker worker = new Worker (e);
int count;
lock (writeQ) {
	writeQ.Enqueue (worker);
	count = writeQ.Count;
if (count == 1)
        socket_pool_queue (Worker.Dispatcher, res);

Another Worker object is created, left without any relevant data and put into the queue? Apart from the fact that this violates the design principles behind the SocketAsyncEventArgs based APIs, which is _not to allocate anything on the heap for a single operation_, this newly created object isn't given the necessary information to function; the Worker.Dispatcher callback only operates on the SocketAsyncResult structure given with the socket_pool_queue() call.

Speaking of: This call is then made with the original data from the SocketAsyncEventArgs argument, not the newly created worker that obviously won't work. Which perfectly explains the bug I encountered: When sending off a single SendToAsync, the dummy worker is enqueued, the dispatcher then operates upon the correct data and at the end blindly dequeues that dummy object again. BUT if you try two calls right after each other, the second call only enqueues the dummy object which then does nothing, leaving the original SocketAsyncEventArgs et al in an undefined state (Hence the missing callback and crash at the end).

Fix: Change that part to what it should (and in fact the Begin* functions do) look like, as in "don't create the dummy and enqueue the correct worker":

mcs/class/System/System.Net.Sockets/Socket.cs, line 1964 FIXED

int count;
lock (writeQ) {
	writeQ.Enqueue (e.worker);
	count = writeQ.Count;
if (count == 1)
        socket_pool_queue (Worker.Dispatcher, res);

At least with my test case and the kinda complex game server I've tried it with, it works flawlessly now.

Also, ReceiveFromAsync() suffers from the same problem and should be changed accordingly.

Again, sorry for not sending a proper diff, but deleting two lines and inserting two characters in two other lines should be possible manually :)
Comment 2 Tammo Hinrichs 2012-01-27 08:15:08 UTC
Oops, sorry, it should be "writeQ.Enqueue (e.Worker);" with a capital W. That's what you get for not using real diffs :/
Comment 4 Chris Hardy [MSFT] 2012-07-24 21:34:58 UTC
Gonzalo, Is this something you can help out with or have any thoughts on?
Comment 5 Tammo Hinrichs 2012-07-25 05:01:37 UTC
Just apply the fix that I described. It's in production use on a game server that serves some thousand connections over here for months and it works flawlessly.

If you wish I can send you a proper diff.
Comment 6 Matthias 2012-08-02 05:10:00 UTC
As mentioned by Jonathan the exact same problem applies to ReceiveFromAync. The code seems pretty equal just a different queue is used.

It seems the simpler methods SendAsync and ReceiveAsync working fine (just the endpoint is ignored).
Comment 7 Tammo Hinrichs 2012-08-02 05:47:42 UTC
Yes, I read the whole source file and it's only SendToAsync and ReceiveFromAsync that are affected. Everything else looks fine.
Comment 8 Paolo Molaro 2012-08-02 14:06:48 UTC
Fixed in master and mono-2-10, thanks!.