Bug 21135 - System.Net.WebConnectionStream.Close blocks after writing with chunked transfer encoding enabled under heavy load
Summary: System.Net.WebConnectionStream.Close blocks after writing with chunked transf...
Status: RESOLVED FIXED
Alias: None
Product: Class Libraries
Classification: Mono
Component: System ()
Version: 3.4.0
Hardware: PC Linux
: --- normal
Target Milestone: Untriaged
Assignee: Martin Baulig
URL:
Depends on:
Blocks:
 
Reported: 2014-07-05 02:04 UTC by Fritz Elfert
Modified: 2016-11-11 09:55 UTC (History)
9 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Stacktrace of the blocking situation. (14.53 KB, text/plain)
2014-07-05 02:14 UTC, Fritz Elfert
Details
Call stack on blocking close call (3.07 KB, text/plain)
2014-11-21 04:32 UTC, Till Lorentzen
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Fritz Elfert 2014-07-05 02:04:38 UTC

    
Comment 1 Fritz Elfert 2014-07-05 02:14:41 UTC
Created attachment 7277 [details]
Stacktrace of the blocking situation.
Comment 2 Fritz Elfert 2014-07-05 02:16:24 UTC
Using latest 3.4.0 from the mono-3.4.0-branch at github, our CmisSync app blocks occassionally in System.Net.WebConnectionStream.Close() when writing ~1000 POST requests.

Unfortunately no *reliably failing* Testcase can be provided.

I have created a pull request at:
 https://github.com/mono/mono/pull/1146

This fixes the deadlock.

Cheers
 -Fritz
Comment 3 Fritz Elfert 2014-07-05 02:23:03 UTC
Forgot to mention:
Of course, this happens only, if chunked transfer encoding is enabled. (Which is always enabled in our App).
Comment 4 Craig Minihan 2014-07-14 10:24:50 UTC
Fritz, I can see and replicate a very similar issue to the one you are reporting here - however I'm not sure that is is identical.

I've written a simple test app which allows me to tweak the HTTP params I pass from a client to a proxy to a server - all in-process. I can reliably cause deadlock on PUT/POST with chunked encoding enabled.

The problem is your fix does not fix the issue I see which seems to be in HttpConnection.cs in the method ProcessInput. The deadlock/runaway case happens in the do/while when line = "" and input_state = InputState.RequestLine.

I've replicated the issue against 3.4 and master (currently reports as 3.6.1).

One thing that isn't clear from your original report is whether you see this issue in a client, a server or both?
Comment 5 Craig Minihan 2014-07-29 07:12:49 UTC
I've seen a chunked related deadlock now - this happens when Close() is called on WebConnectionStream:

public override void Close ()
{
	if (sendChunked) {
		if (disposed)
			return;
		disposed = true;
		pending.WaitOne ();
		byte [] chunk = Encoding.ASCII.GetBytes ("0\r\n\r\n");
		string err_msg = null;
		cnc.Write (request, chunk, 0, chunk.Length, ref err_msg);
		return;
	}
	
	...
}

The problem is that pending.WaitOne() never returns. When writing to the stream the event is set in EndWrite():

public override void EndWrite (IAsyncResult r)
{
	...

	if (sendChunked) {
		lock (locker) {
			pendingWrites--;
			if (pendingWrites == 0)
				pending.Set ();
		}
	}
}

The problem I'm seeing is that pendingWrites is negative when you break in the Close(), this means that the pendingWrites == 0 case is never met and the event is never set.

I am calling WebConnectionStream.Write() via CopyTo as in:

request.SendChunked = true;
var sourceStream = myStream;
using (var reqStream = request.GetRequestStream())
{
    sourceStream.CopyTo(reqStream);
}

Where the CopyTo is the default version from Stream. I'll investigate further to see what the root cause is.
Comment 6 Craig Minihan 2014-07-29 12:55:45 UTC
This is the same issue as #20134 and #8886. I've added tracing to BeingWrite and EndWrite to show the pendingWrites value, results here:

BeginWrite() pendingWrites=1
EndWrite() pendingWrites=0
BeginWrite() pendingWrites=1
EndWrite() pendingWrites=0
BeginWrite() pendingWrites=1
EndWrite() pendingWrites=0
BeginWrite() pendingWrites=1
EndWrite() pendingWrites=0
BeginWrite() pendingWrites=1
EndWrite() pendingWrites=0
... time passes ...
BeginWrite() pendingWrites=1
EndWrite() pendingWrites=0
BeginWrite() pendingWrites=1
EndWrite() pendingWrites=0
EndWrite() pendingWrites=-1
BeginWrite() pendingWrites=0
EndWrite() pendingWrites=-1
... more time ...
BeginWrite() pendingWrites=0
EndWrite() pendingWrites=-1
EndWrite() pendingWrites=-2
BeginWrite() pendingWrites=-1
EndWrite() pendingWrites=-2
BeginWrite() pendingWrites=-1
EndWrite() pendingWrites=-2
BeginWrite() pendingWrites=-1
EndWrite() pendingWrites=-2
... pendingWrites doesn't recover and we end up with a hang on WaitOne() ...

Clearly we have an issue where EndWrite is called back-to-back, I'm not sure on the scenario which causes this yet, however this looks suspicious:

In EndWrite():

if (result.EndCalled)
    return;

result.EndCalled = true;

This code would not be thread safe if EndWrite() is called more than once per write. I'm not how this scenario could happen.
Comment 7 Craig Minihan 2014-07-29 13:06:49 UTC
Wrapping the result.EndCalled test and assignment in a section does solve the problem. This isn't a very good solution since the reason for the duplicate call to EndWrite still eludes me. So a simplistic and horrible fix would be:

lock (locker) {
    if (result.EndCalled)
        return;

    result.EndCalled = true;
}
Comment 8 Martin Baulig 2014-07-29 13:07:50 UTC
Do you have a test case for this?  I'm unable to reproduce this locally, but I'm only doing one chunked POST at a time from a single thread.

We also need to figure out what's calling that second EndWrite() - could you try to add

        if (pendingWrites-- < 0)
                throw new InvalidOperationException ();

and look at the stacktrace?
Comment 9 Craig Minihan 2014-07-29 13:13:38 UTC
My scenario is single request/single thread - however it fails pretty much every time. The request body is large - so a test case with 10MB -40MB may reveal the problem.

I'll try and create a replication test case shortly.
Comment 10 Craig Minihan 2014-07-29 19:10:50 UTC
Ok, I've put a test tool here: https://github.com/exony-craigminihan/TestSystemNet

This code shows the "input_state = InputState.RequestLine" problem I mentioned above. I believe this problem hides the deadlock with chunking so it will need to be fixed first.

To recreate: clone, build, run. Increase the multiplier/body size (press m a few times) and wait. You'll see a core hit 100%, break the debugger and you'll see it is stuck in HttpConnection.ProcessInput while executing the do/while. The variable 'line' is "" which cannot exit the do/while.

I'm not sure exactly how ProcessInput enters this state - I have seen the issue outside the test tool so it isn't just an academic failure case.
Comment 11 Craig Minihan 2014-07-29 19:17:13 UTC
That should of course be the "input_state == InputState.RequestLine" problem - the missing = is very important!
Comment 12 Craig Minihan 2014-07-30 08:19:44 UTC
This test code shows the WebConnectionStream.Close() deadlock problem: https://github.com/exony-craigminihan/TestSystemNetWebConnectionStreamCloseDeadlock

If you don't see a deadlock try a bigger file. The large_2600.txt fails pretty much every time for me whereas 2600.txt never fails.

I suspect EndWrite() is always called twice, it just depends on the behaviour of the code below as to whether multiple threads can enter the pendingWrites-- block.

---------------------------
if (result.EndCalled)
   return;

result.EndCalled = true;
Comment 13 Martin Baulig 2014-08-06 08:25:15 UTC
I've done some extensive testing and could only reproduce this with Mono 3.4 and the version that we shipped with Xamarin.iOS 7.2.4.

Both your two tests and my own from my new web-tests suite work perfectly fine with the upcoming Mono 3.8, so it looks like this has already been fixed.
Comment 14 Craig Minihan 2014-08-06 08:39:09 UTC
Martin, thanks for the response. Just so I get what you are saying here:

You replicated the issue on Mono 3.4 (all platforms) and in addition in Xamarin.iOS 7.2.4?

I have local patches for both the issues raised here and I can't see any changes in master which look anything like my changes. I'll test against master/3.8 when I get some time to (hopefully) verify your findings.

Cheers!
Comment 15 Craig Minihan 2014-08-06 08:45:42 UTC
Actually I just noticed I mentioned above:

"I've replicated the issue against 3.4 and master (currently reports as 3.6.1)."

Does 3.8 System.Net deviate from master as it was on July 14th?
Comment 16 Martin Baulig 2014-08-06 11:56:09 UTC
Yeah, it was branched long after that.
Comment 17 Nicolas Raoul 2014-11-14 22:33:54 UTC
Please reopen:

The fix got lost again, never making it into the subsequent official
branches.

The original reporter made a pull request again, please apply it, and make sure it does not get lost in subsequent releases, thanks! https://github.com/mono/mono/pull/1156

This bug makes Mono totally unreliable for any app that uploads files, so while reopening please also raise Priority to High, thank you!
Comment 18 Till Lorentzen 2014-11-21 04:24:17 UTC
Hi all!

Nicolas Raoul is right! Please reopen this issue because it is not fixed in mono 3.10.0. Nicolas added a larger example for this issue. This example needs a CMIS Server but reproduces this issue after a few uploads of random files:

https://github.com/nicolas-raoul/dotcmis-upload

And I totally agree with Nicolas that this bug is really important!

Thanks a lot.
Comment 19 Till Lorentzen 2014-11-21 04:32:54 UTC
Created attachment 8828 [details]
Call stack on blocking close call

This is the call stack of the execution when the close call blocks. It is made on Mac OS 10.9 with the actual Xamarin Studio 5.5.4 (build 15).
Comment 20 Till Lorentzen 2014-11-21 10:19:09 UTC
This is an in memory CMIS server which can be used to evaluate the upload problematic:

http://chemistry.apache.org/java/developing/repositories/dev-repositories-inmemory.html
Comment 23 Fritz Elfert 2014-12-11 09:03:35 UTC
@Allie Miller (Comment 21):

Of course this still there as reported already in Comment 18. My *working* fix (which I provided in July already!!!) is *still* waiting for approval on github (assigned to baulig). Perhaps someone at Xamarin can assign this to somebody else who then actually *does* approve it instead of letting it rott. :-(

CU
 -Fritz
Comment 24 Nicolas Raoul 2014-12-11 10:06:51 UTC
We also really hope the fix will be merged... for now we have disabled chunking on Mac/Linux, but that makes our software inefficient and slow.

The title of this issue says "under heavy load", but in my experience it reproduces most of the time, even on very powerful Mac machines with no other programs running and CPU around 0%.

Comment 2 talks about "~1000 POST requests" but I can reproduce the issue with far less requests, actually it often happens at the first POST request.
Comment 25 Brendan Zagaeski (Xamarin Team, assistant) 2015-02-06 12:56:13 UTC
For bookkeeping I'll note that the pull request [1] was merged to Mono master on January 22, 2015. Thanks for the patch!

[1] https://github.com/mono/mono/pull/1156

If the current release schedule proceeds smoothly, the patch should be included in a stable release of Mono by April.
Comment 26 Martin Baulig 2016-11-11 09:55:46 UTC
According to comment #25, this was fixed a long time ago.