Bug 15135 - HttpWebRequest leaks sockets
Summary: HttpWebRequest leaks sockets
Status: RESOLVED FIXED
Alias: None
Product: Class Libraries
Classification: Mono
Component: System ()
Version: unspecified
Hardware: PC Mac OS
: Highest critical
Target Milestone: Untriaged
Assignee: Martin Baulig
URL:
: 11634 13750 ()
Depends on:
Blocks:
 
Reported: 2013-10-02 16:04 UTC by Jonathan Pryor
Modified: 2014-11-04 18:54 UTC (History)
16 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
iOS 6.1 Simulator result with monotouch-7.0.4.213.pkg (119.02 KB, image/png)
2014-01-16 16:50 UTC, T.J. Purtell
Details
iOS 6.1 Simulator result with monotouch-7.0.4.213.pkg (Using AFNetworkHandler) (118.02 KB, image/png)
2014-01-16 16:51 UTC, T.J. Purtell
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Jonathan Pryor 2013-10-02 16:04:24 UTC
Note: I may be an idiot, or I'm missing something fundamental, or I just have unrealistic expectations.

Context: https://bugzilla.xamarin.com/show_bug.cgi?id=13750

Background (i.e. Why I Care): a Socket uses file descriptors, and Android's Linux (emulator) only allows a limited number of them to be open at once (~982, by my estimation). If an app downloads lots of http(s) URLs and the Socket's aren't closed, the process will run out of file descriptors and start reporting "bizarre" errors, including NameResolutionFailure and Network is unreachable. The problem isn't that the network has suddenly broken; the problem is that the process has run out of file descriptors, and nothing that requires file descriptors will work sanely.

Behold my test code! It uses HttpWebRequest to access www.example.com, checking to see if the underlying Socket has been disposed or not.

I have two questions regarding it:

Q1: Am I using the API correctly? (Note: I may be an idiot.)
Q2: How should I be cleaning up resources?

Code:

  using System;
  using System.Linq;
  using System.Net;
  using System.Net.Sockets;
  using System.Reflection;

  class Test {
    public static void Main ()
    {
      var u = new Uri ("http://www.example.com");
      for (int i = 0; i < 10; ++i) {
        GetUri (u);
      }
    }

    static void GetUri (Uri uri)
    {
      var request     = (HttpWebRequest) WebRequest.Create (uri);
      request.Method  = "GET";
      var response    = request.GetResponse ();
      var wc          = GetWebConnection (request);
      var socket      = GetSocketFromWebConnection (wc);
      WriteSocketState (socket, "# BEG: ");
      using (var s = response.GetResponseStream ()) {
      }
      response.Dispose ();
      WriteSocketState (socket, "# END: ");
    }

    static void WriteSocketState (Socket s, string f, params object[] a)
    {
      bool? d = IsSocketDisposed (s);
      Console.Write (f, a);
      Console.WriteLine ("socket? {0,-5} Disposed={1,-5} Handle={2}",
          s != null,
          d.HasValue ? d.Value.ToString () : "?",
          s == null ? "" : s.Handle.ToString ("x"));
    }

    static object GetWebConnection (HttpWebRequest request)
    {
      var HttpWebRequest_WebConnection = typeof (HttpWebRequest)
        .GetField ("WebConnection",
          BindingFlags.NonPublic | BindingFlags.Instance);
      if (HttpWebRequest_WebConnection == null)
        return null;
      return HttpWebRequest_WebConnection.GetValue (request);
    }

    static Socket GetSocketFromWebConnection (object webConnection)
    {
      if (webConnection == null)
        return null;
      var WebConnection_socket = webConnection.GetType ()
        .GetField ("socket", BindingFlags.NonPublic | BindingFlags.Instance);
      if (WebConnection_socket == null)
        return null;
      return (Socket) WebConnection_socket.GetValue (webConnection);
    }

    static bool? IsSocketDisposed (Socket s)
    {
      var f = typeof (Socket)
        .GetField ("disposed", BindingFlags.NonPublic | BindingFlags.Instance);
      if (f == null)
        return null;
      return (bool) f.GetValue (s);
    }
  }

Sample execution. Note that _sometimes_ the Socket is disposed, and sometimes (usually) it isn't:

> $ mono app.exe 
> # BEG: socket? True  Disposed=False Handle=4
> # END: socket? True  Disposed=False Handle=4
> # BEG: socket? True  Disposed=False Handle=8
> # END: socket? True  Disposed=False Handle=8
> # BEG: socket? True  Disposed=False Handle=9
> # END: socket? True  Disposed=False Handle=9
> # BEG: socket? True  Disposed=False Handle=a
> # END: socket? True  Disposed=False Handle=a
> # BEG: socket? True  Disposed=False Handle=b
> # END: socket? True  Disposed=True  Handle=ffffffff
> # BEG: socket? True  Disposed=False Handle=b
> # END: socket? True  Disposed=False Handle=b
> # BEG: socket? True  Disposed=False Handle=c
> # END: socket? True  Disposed=False Handle=c
> # BEG: socket? True  Disposed=False Handle=d
> # END: socket? True  Disposed=False Handle=d
> # BEG: socket? True  Disposed=False Handle=e
> # END: socket? True  Disposed=False Handle=e
> # BEG: socket? True  Disposed=False Handle=f
Comment 1 Jonathan Pryor 2013-10-02 17:00:28 UTC
*** Bug 13750 has been marked as a duplicate of this bug. ***
Comment 2 Martin Baulig 2013-10-07 17:57:31 UTC
Just had a quick look at this and I can't see any problems in your code.  Disposing both the response stream and the response should get rid of the socket.

Looks like we're somehow not correctly closing the socket.  I'll have a look at it when I'm back home.
Comment 3 Martin Baulig 2013-10-10 14:27:39 UTC
Which version of Mono are you using?

I'm getting this:

# BEG: socket? True  Disposed=False Handle=4
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=4
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=4
# END: socket? True  Disposed=False Handle=4
# BEG: socket? True  Disposed=False Handle=4
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=4
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=4
# END: socket? True  Disposed=False Handle=4
# BEG: socket? True  Disposed=False Handle=4
# END: socket? True  Disposed=False Handle=4
# BEG: socket? True  Disposed=False Handle=4
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=4
# END: socket? True  Disposed=False Handle=4
# BEG: socket? True  Disposed=False Handle=4
# END: socket? True  Disposed=True  Handle=ffffffff

with Mono 3.2.3 on Mavericks.

I also tried the
                        var response = await newRequest.GetResponseAsync();
loop from #13750, it runs all 1000 iterations without problem.  Looking at the process with Activity Monitor also confirms that it's not leaking any fd's.
Comment 4 Jonathan Pryor 2013-10-10 14:38:21 UTC
> Which version of Mono are you using?

> $ mono --version
> Mono JIT compiler version 3.2.3 ((no/8d3b4b7 Mon Sep 16 23:46:28 EDT 2013)

This is on OS X Mountain Lion 10.8.5.

I also think that your output does show "leakage"; why isn't the Socket _always_ Disposed=True? (More than half of your requests have Disposed=False.)

Granted, the Handle value is never larger than 4 for you, but that doesn't make me feel particularly confident. When will the Socket be disposed? How can we _ensure_ that the Socket (and accompanying file descriptor) are closed? I'm not seeing it in your output.
Comment 5 Martin Baulig 2013-10-10 14:41:49 UTC
You're right, it doesn't look like it's reliably closed anywhere.  I'll investigate a bit more.
Comment 6 Martin Baulig 2013-10-10 15:10:11 UTC
A quick single-stepping through this in XS shows that there might be a problem ...
Comment 7 Jonathan Pryor 2013-11-25 11:51:41 UTC
*** Bug 11634 has been marked as a duplicate of this bug. ***
Comment 8 T.J. Purtell 2013-12-16 15:23:49 UTC
I have definitely been seeing these name resolution failures and weird behavior as a result.

I think there are additional problems with the WebRequest in mono as well.  .NET's WebRequest uses connection pools with parameters defined by the ServicePointManager.  It doesn't look like Android does this (from my experience and from the initial log Jonathan posted for Android.  I see the same behavior (a socket per connection) using latest Windows mono.

This is particular problematic for SSL connections because there is a significant set up cost to these connections.  Unfortunately, as a result of this I was forced to port all my WebRequest code to use Microsoft HTTP Client libraries.  Luckily that is now licensed for use in Xamarin.  It isn't drop in compatible and changes the Exception management requirements in a big way so it is not an easy workaround.
Comment 9 T.J. Purtell 2013-12-16 23:14:33 UTC
I received a report from one of my team members that they observed a too many file handles-name resolution exception even using the new code.  I will have to investigate more to see if there is a code issue on our end, however, this might point to a lower level error as the root cause.  For example, if the Socket class itself occasionally leaked rather than it being directly a result of the WebRequest class.  I had been using the iOS simulator as my test bed for validiting there were no handle leaks with the new code, since it is must easier easy to inspect the process for open handles.  The error report came from an Android device.
Comment 10 T.J. Purtell 2013-12-17 03:13:22 UTC
OK, so I have a version of my app that can run with the simple .NET framework.  It serves its UI up via a local HTTP Server.  This lets me run the exact same code on Mono as well as on .NET on Windows.  I can't do a proper comparative analysis using this because of https://bugzilla.xamarin.com/show_bug.cgi?id=12875

That said.  I run  the iOS version of my app in the simulator (even with Microsoft's HTTP library) and I find that connections are **NOT** reused.  I don't see permanent file handle leaks from these web requests.  A GC seems to make them go away.  If I run using Microsoft's .NET, I get perfect connection reuse.  I see a max of 4 connections per host.

I use Charles proxy with SSL proxying disabled to monitor the number of connections created to service requests made by the application.

Perhaps there is a Mono bug related to the connection pooling that is causing this.  For example, it tries to keep connections alive to pool, but then it never decides to actually use them for a new request, so they **look** like they are leaked until a GC decides to nuke these extra connections.
Comment 11 Miguel de Icaza [MSFT] 2013-12-17 20:03:49 UTC
TJ,

Notice that Microsoft's System.Net.HttpClient is merely a thin/async-friendly wrapper on top of the actual HttpWebRequest in Mono.   So there should be really no difference there.
Comment 12 T.J. Purtell 2013-12-17 22:00:35 UTC
Here is a project that includes iOS, Android, and Console projects that makes requests to https://ipv4.google.com

When I enabled a proxy, I see HTTP CONNECT requests for each request under Mono, but under real .NET, I see only one request.

This is the easiest way to show that WebRequest is not reusing connections.  Jonathan's original log post should show that the same WebConnection is used every time, instead it shows that each request uses a new socket.
Comment 13 Miguel de Icaza [MSFT] 2013-12-17 22:11:48 UTC
Project referenced in comment #12 is here:

https://bitbucket.org/tpurtell/web-request-reuse
Comment 14 T.J. Purtell 2013-12-18 01:55:27 UTC
Oops!  Thank you for sharing the link Miguel.  

I ran that test using no proxy and altered to use plain HTTP on my Mac in Console mode and in the iOS simulator.  I traced using Wireshark during this process.  I confirmed that each request occurs on a new connection under this much simpler scenario by capturing a trace using Wireshark.

Here is pcap file that shows 29 request from my iOS simulator test app to http://ipv4.google.com with no proxy enabled.
https://www.dropbox.com/s/zkkp4uvce30vwt7/trace.pcapng.gz

For reference sake, I used the absolute latest bleeding edge Xamarin alpha suite (downloaded last night)
Xamarin Studio
Version 4.3.0 (build 52)
Runtime:
	Mono 3.2.5 ((no/964e8f0)
	GTK+ 2.24.20 theme: Raleigh
	GTK# (2.12.0.0)
	Package version: 302050000

Apple Developer Tools
Xcode 5.0.2 (3335.32)
Build 5A3005

Xamarin.iOS
Version: 7.0.5.2 (Business Edition)
Hash: 9c42159
Branch: 
Build date: 2013-02-12 20:04:08-0500

Xamarin.Android
Version: 4.10.2 (Business Edition)
Android SDK: /Users/tj/Library/Developer/Xamarin/android-sdk-mac_x86
	Supported Android versions:
		2.1   (API level 7)
		2.2   (API level 8)
		2.3   (API level 10)
		3.1   (API level 12)
		4.0   (API level 14)
		4.0.3 (API level 15)
		4.1   (API level 16)
		4.2   (API level 17)
		4.3   (API level 18)
Java SDK: /usr
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b12)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

Xamarin.Mac
Xamarin.Mac: 1.6.27

Build Information
Release ID: 403000052
Git revision: cd02fcfb350930f468f3d7cbf8e39f940553d378
Build date: 2013-11-24 02:13:57+0000
Xamarin addins: 14d41853742c36662973f9bbc0d14e58befdebfb

Operating System
Mac OS X 10.8.5
Darwin lucite.private 12.5.0 Darwin Kernel Version 12.5.0
    Sun Sep 29 13:33:47 PDT 2013
    root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64
Comment 15 Martin Baulig 2013-12-18 03:26:28 UTC
For testing (much easier than using wireshark or tcpdump):

http://last-hope.baulig.net/waringers-lab/bugfree-octo-nemesis/www/cgi-bin/get-puppy.pl
and
https://last-hope.baulig.net/waringers-lab/bugfree-octo-nemesis/www/cgi-bin/get-puppy.pl

Prints this:

====
METHOD: GET
PATH: 
REMOTE: 91.67.1.107
PORT: 57930
====

You can add paths after the script like this:
http://last-hope.baulig.net/waringers-lab/bugfree-octo-nemesis/www/cgi-bin/get-puppy.pl/hello/world

which would then print

====
METHOD: GET
PATH: /hello/world
REMOTE: 91.67.1.107
PORT: 57981
====

Source is here: https://github.com/baulig/bugfree-octo-nemesis
Comment 16 Martin Baulig 2013-12-18 05:40:32 UTC
I ran some tests using my own console app which talks to a CGI script on my webserver and can confirm that we are definitely reusing connections.

The only issue that I see is that the first three requests always happen on a new connection, then we reuse the connection for about 90-150 requests before it it closed.

.NET always runs exactly 100 requests on each connection, every 101st request is then using a new one.

The server is running Amazon EC2 t1.micro - temporarily changing it to m1.medium (highest that I could change the instance into) and back again did not make any difference.

I have restarted the script several times and it's always using a new connection for each of the first three requests - after that, it starts reusing them.
Comment 17 Martin Baulig 2013-12-18 05:44:46 UTC
This is the script that I'm running on the server:
https://github.com/baulig/bugfree-octo-nemesis/blob/master/www/cgi-bin/get-puppy.pl

Client test case:
https://github.com/baulig/bugfree-octo-nemesis/blob/master/Xamarin.WebTests/GetPuppy.cs
https://github.com/baulig/bugfree-octo-nemesis/blob/master/Xamarin.WebTests/MainClass.cs

The perl script prints the remote port number, so that's easier to debug than using wireshark or tcpdump.
Comment 18 Miguel de Icaza [MSFT] 2013-12-18 10:15:31 UTC
Hey guys,

Gonzalo provided the following insight to me:

(a) Connections that do not receive a "Connection: close" are kept on the WebConnectionGroup as a WeakReference, which is why you see the effect that the GC will sometimes close the connections.

(b) One option to force the connection to shutdown is to use HTTP/1.0, but that would not reuse any HTTP connections, and would not be what we want.
Comment 19 Jonathan Pryor 2013-12-18 10:58:51 UTC
Could we/should we provide an extension method in a Mono.Net.dll assembly (or something) that would allow us to deterministically close the underlying Socket?

I think the problem is that users are running out of file descriptors, resulting in "bizarro" errors, and the GC isn't smart enough to know that the process is low on file descriptors and thus should collect some of these HttpWebRequest/Socket instances.

Alternatively, is there some way to bound the number of HttpWebRequests that we keep alive?

Related: it would be _really handy_ if io-layer could log when file descriptors were acquired/released based on MONO_LOG_LEVEL/etc. At present, the only way to tell that a file descriptor is being used is by recompiling libmono*.so, which isn't fun. This would also help with at least one support request we've had recently complaining about running out of file descriptors, and there's no easy way to track that down w/o a full repro+source.
Comment 20 T.J. Purtell 2013-12-18 12:05:21 UTC
Martin, that is a cool test idea.  Unfortunately, I can't replicate your test scenario because there are a lot of variables in terms of server configuration.  Can you share a test URL that you believe to work well?  

If I run Jonathan's original test case on my Mac I see this
# BEG: socket? True  Disposed=False Handle=5
# END: socket? True  Disposed=False Handle=5
# BEG: socket? True  Disposed=False Handle=9
# END: socket? True  Disposed=False Handle=9
# BEG: socket? True  Disposed=False Handle=a
# END: socket? True  Disposed=False Handle=a
# BEG: socket? True  Disposed=False Handle=b
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=b
# END: socket? True  Disposed=False Handle=b
# BEG: socket? True  Disposed=False Handle=c
# END: socket? True  Disposed=False Handle=c
# BEG: socket? True  Disposed=False Handle=d
# END: socket? True  Disposed=False Handle=d
# BEG: socket? True  Disposed=False Handle=e
# END: socket? True  Disposed=False Handle=e
# BEG: socket? True  Disposed=False Handle=f
# END: socket? True  Disposed=False Handle=f
# BEG: socket? True  Disposed=False Handle=10
# END: socket? True  Disposed=False Handle=10

When I run that test case on Mono for Windows I see this
    # BEG: socket? True  Disposed=False Handle=29c
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=2f0
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=2f8
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=300
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=308
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=310
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=318
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=320
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=32c
# END: socket? True  Disposed=True  Handle=ffffffff
# BEG: socket? True  Disposed=False Handle=334
# END: socket? True  Disposed=True  Handle=ffffffff

Neither of these are indicative of connection reuse.  I also see that your log above shows that you see the socket reused!  Why are you different? Please share your magical bits :)

Jonathan, there are a lot of configuration options for HTTP connection limits in the ServicePoint/ServicePointManager classes.  My guess it that ServicePointManager.MaxServicePoints should put a hard cap on how many connections can be used.  I think it throws an exception if this is exceeded. http://msdn.microsoft.com/en-us/library/system.net.servicepointmanager.maxservicepoints(v=vs.110).aspx  There are a bunch of other options to control behavior as well.
Comment 21 T.J. Purtell 2013-12-18 13:00:06 UTC
I pushed a change to my repository that prints information from the
ServicePointManager. In particular.

ServicePointManager.FindServicePoint(TEST_URI).CurrentConnections

Using mono, this prints
Total requests 1 Current connections 0:
Total requests 7 Current connections 0:
Total requests 19 Current connections 0:

Using .NET, this prints
Total requests 13 Current connections 1:
Total requests 29 Current connections 1:
Total requests 44 Current connections 1:

So it kind of looks like Mono is failing to find the existing service point for
a given URL.  This would theoretically cripple its ability to reuse a
connection or otherwise apply sane limits as defined by the .NET API to the
number of connections.
Comment 22 Martin Baulig 2013-12-18 14:11:05 UTC
Here's my test module:
https://github.com/baulig/bugfree-octo-nemesis

After checking out, add this app.config file to the root directory of your checkout:

====
<configuration>
  <appSettings>
    <add key="web_host" value="last-hope.baulig.net"/>
    <add key="web_prefix" value="/waringers-lab/bugfree-octo-nemesis/"/>
    <add key="squid_address" value="http://stalhof.baulig.net:3128/"/>
    <add key="squid_address_ssl" value="https://stalhof.baulig.net:3129/"/>
  </appSettings>
</configuration>
====

Load the Xamarin.WebTests.sln in Xamarin Studio, right-click References in the Solution Explorer and select "Restore NuGet packages" - it's using NuGet light
Comment 23 Martin Baulig 2013-12-18 14:18:13 UTC
Miguel, Gonzalo, Jonathan: That does not really matter because the ServicePointManager already exposes a limit on the maximum number of connections.

You guys are also mixing up two separate issues here: I can not - and have never been - able to run out of sockets or file descriptors, nor does it leak any sockets for me.

The only issue that I see is that a connection is not always reused when it's supposed to be.

That is actually a race condition that I'm currently working on.  Note that this _could_ in theory lead towards running out of sockets if it does not reuse the connection more often than 4 out of 100 times.
Comment 24 T.J. Purtell 2013-12-18 14:41:14 UTC
I ran Martin's test from Xamarin Studio on my Mac and I see it is making a new connection per second (Pretty sure that's one per request).  I think this should rule out any special HTTP headers being the issue.

NEW PORT: #0: [GetPuppy: Method=GET, Path=, RemoteAddress=24.6.51.150, RemotePort=62512]
NEW PORT: #1: [GetPuppy: Method=GET, Path=, RemoteAddress=24.6.51.150, RemotePort=62515]
NEW PORT: #2: [GetPuppy: Method=GET, Path=, RemoteAddress=24.6.51.150, RemotePort=62516]
...
NEW PORT: #178: [GetPuppy: Method=GET, Path=, RemoteAddress=24.6.51.150, RemotePort=62701]
NEW PORT: #179: [GetPuppy: Method=GET, Path=, RemoteAddress=24.6.51.150, RemotePort=62702]
Comment 25 Martin Baulig 2013-12-18 15:42:30 UTC
Debugging the first connection close ....

Mono with lots of debugging spew: https://github.com/baulig/mono/tree/work-cnc-reuse, line numbers below are from this commit, which is the current HEAD: https://github.com/baulig/mono/commit/ca13d7016a160799d6bbce419c899498daf8559d

Now look at the output: https://gist.github.com/baulig/039813b4cd5a0e7e9558#file-gistfile1-txt

This is a bit cryptic, so also look at the source code.

Line 21: WebConnectionStream.Close() is called on the main thread after the client read all the data - this happens when we exit the "using" block in GetPuppy.cs.  The parameters are isRead, nextReadCalled, allowBuffering.  This is https://github.com/baulig/mono/blob/ca13d7016a160799d6bbce419c899498daf8559d/mcs/class/System/System.Net/WebConnectionStream.cs#L767.

Line 22: isRead was true, nextReadCalled is false, so we're calling CheckComplete().

Line 23: CheckComplete() at WebConnectionStream.cs:224
			Debug ("CHECK COMPLETE: {0} {1} {2} {3} {4}", nrc, readBufferOffset, readBufferSize, contentLength, readBufferSize - readBufferOffset == contentLength);

So contentLength has been set to the full length of the response (which we have fully read at this point), but neither readBufferOffset nor readBufferSize have been updated.  We then call WebConnection.Close() (lines 24 and 27).

Why has readBufferOffset not been updated?

Look at lines 8-10, this is WebConnection.cs:520, 551 and https://github.com/baulig/mono/blob/ca13d7016a160799d6bbce419c899498daf8559d/mcs/class/System/System.Net/WebConnection.cs#L592.  The last one is the interesting one: chunkedRead is true and chunkStream == null.

So we create a new ChunkStream, but do not update WebConnectionStream.ReadBufferOffset/Size.
Comment 26 Martin Baulig 2013-12-18 15:45:30 UTC
So this entire WebConnectionStream.CheckComplete() logic needs some love.
Comment 27 T.J. Purtell 2013-12-18 23:33:31 UTC
Ah, I see. The logic that provides the virtual streams per WebRequest on top of the WebConnection is not taking the correct actions as the request is completed.  As a result, the connection can not be reused, because it does not appear to be in a fully completed state.

There is quite a bit of code to peruse, but I noticed that the ServicePoint's ConnectionCount is never updated.  I have a feeling this means that any connection limitation logic may not work correctly.  It is ofcourse possible that the connection limit logic in Mono directly uses the connection list to apply these limits, but I haven't gotten to dig that deep yet.

For reference, the IncrementConnection method:
https://github.com/mono/mono/blob/master/mcs/class/System/System.Net/ServicePoint.cs#L340

Using grep -r, I see that it is never called from anywhere.
Comment 28 Martin Baulig 2013-12-19 15:19:17 UTC
There is some code dealing with limits, don't remember exactly where that is, though.

I have just updated my get-puppy.pl script to address all the three scenarios that the server can send us:

a) The server does not send any Content-Length header.

b) The server does send a Content-Length header.

c) The server uses chunked transfer encoding.

That code path that I talked about yesterday is a) - it's a common behavior from all kinds of simple scripts (more complex / ASP.NET kind of scripts would be more likely to use chunked encoding) whereas static content should usually be returned with Content-Length.
Comment 29 T.J. Purtell 2013-12-20 04:01:48 UTC
AFAIK, situation B is equivalent to including a Connection: close header.  The only way to terminate the response is to close it out.

Generally in my team's application, everything has a Content-Length.  So I definitely fall under the situation A case.  Nonetheless I tried your test script out with ?mode=chunked.  I saw it still created new connections for me.

Another failure mode I have seen which I don't have a test case for is that a request will just hang for on the order of a hundred seconds.  These are requests for static content on S3 over SSL.  Since I can't produce a test case that makes them, I am ignoring them for now and focusing on the connection reuse and handle leakage aspects.  But it sounds like its possible that these are related to the issue at hand.
Comment 30 T.J. Purtell 2013-12-20 05:16:36 UTC
I tried dropping in ModernHttpClient as a replacment HttpClientHandler, and the performance boost (from connection reuse presumably) was like MAGIC.  Unfortunately, that library does not properly marshal HTTP result codes, so it isn't possible to build a fully functioning application out of it.  I am looking forward to seeing similar blistering performance from Xamarin.XYZ soon :D.
Comment 31 T.J. Purtell 2013-12-22 14:35:55 UTC
Even adapting that library to pass exceptions is not working as a temporary work around.  It ends up causing an invalid access to a previously freed GREF. 

:( :( :(
Comment 32 Jonathan Pryor 2013-12-26 11:02:24 UTC
@T.J.Purtell:
> It ends up causing an invalid access to a previously freed GREF.

Please try the XA 4.11 alpha, which contains a number of multithreading fixes and _may_ fix these issues.

If it still fails on XA 4.11, please file a _new_ bug with a repro, enable gref logging, and attach the gref log.
Comment 33 T.J. Purtell 2013-12-31 12:19:19 UTC
Any progress on this HttpWebRequest issue?

From my testing, it has serious bad repercussions on Xamarin.iOS in addition to Xamarin.Android.  On iOS the file handle limit is much lower.  By my measurement about 78 file handles are allowed.  This leak causes strange issues there as well as bad performance.  Workarounds all lower performance or are unreliable.
Comment 34 Miguel de Icaza [MSFT] 2014-01-02 08:58:43 UTC
TJ,

Xamarin's offices were closed from the 24th to the 2nd.   Most of the team took vacations.

Miguel
Comment 36 T.J. Purtell 2014-01-16 15:06:11 UTC
I have gone and updated that test project so that it now includes a server javascript that can provide the port number for connection counting (similar to Martins test design)  and a second ios project that uses the hack around to show the working case for comparison.  This should show that the server javascript is functioning properly.

The repository is here
https://bitbucket.org/tpurtell/web-request-reuse

To run the server
node port-number.js

Then run one of the projects
- web-request-resuse.sln - IosWebRequest - This uses the mono Http stack and directly calls WebRequest.Create, etc
- web-request-resuse-hack.sln - IosMicrosoftHttp - This uses a custom HttpMessageHandler based on the AFNetwork library and the async friends HttpClient abstraction
Comment 37 T.J. Purtell 2014-01-16 16:50:17 UTC
Created attachment 5859 [details]
iOS 6.1 Simulator result with monotouch-7.0.4.213.pkg
Comment 38 T.J. Purtell 2014-01-16 16:51:04 UTC
Created attachment 5860 [details]
iOS 6.1 Simulator result with monotouch-7.0.4.213.pkg (Using AFNetworkHandler)
Comment 39 Martin Baulig 2014-01-17 12:49:52 UTC
Thanks a lot for this test case!

I can now see why I was unable to reproduce this issue: when I launch this in the ios simulator and start a single network worker, then it's reusing the connection perfectly fine.  Launching a second worker adds a few connections, but the total number of connections still stays stable after a few seconds.  Same when adding a few more workers, except that the number of new connections increases.  But it still stays stable after a few seconds, thus reusing all those that it opened.  However, if I keep adding new network workers, then there's a point where it actually starts leaking connections!

I need to start about 4-5 network workers before it's actually starting to leak connection - and then, it keeps opening new connections.

Now I finally have something to work with :-)
Comment 40 T.J. Purtell 2014-01-17 17:10:56 UTC
Martin, that is great news!

FYI, even with the 213 package, I can reproduce 100% with one worker in the simulator.  Since the test case is using a local host server, it should eliminate a lot of variability.  Hopefully we are both actually hitting the same race just with different timing.

I am wondering if something about # of cores or OS version changes the frequency of the error.

My machine is 
  Model Name:	Mac mini
  Model Identifier:	Macmini6,2
  Processor Name:	Intel Core i7
  Processor Speed:	2.3 GHz
  Number of Processors:	1
  Total Number of Cores:	4
  L2 Cache (per Core):	256 KB
  L3 Cache:	6 MB
  Memory:	8 GB

Running this OS
  System Version:	OS X 10.8.5 (12F45)
  Kernel Version:	Darwin 12.5.0
  Boot Volume:	Macintosh HD
  Boot Mode:	Normal
Comment 42 Miguel de Icaza [MSFT] 2014-02-19 14:50:24 UTC
Update:

Martin has now rewritten the core of the engine that deals with sharing connections, fixing several design mistakes and bugs along the way.   We now:

* Properly reuse connections
* We close connection after ServicePoint.MaxIdleTime has elapsed
* We update ServicePoint.CurrentConnections and ServicePoint.IdleSince
* ServicePointManager recycles idle ServicePoints when making new connections
* We handle MaxServicePoints
* We handle MaxConnections

In addition, we just added an async pipeline for TLS/SSL that does not clog the ThreadPool (patch was a contribution, but we had to fix a bug on it).   

Those changes are now on mono/master, and will be coming to a desktop or mobile device near you soon.
Comment 43 Martin Baulig 2014-02-20 07:53:09 UTC
Hey, yeah sorry for not updating the bug.  I started to really enjoy fixing the web stack, so I jumped right into the next problem ...

Fully async requests, reliable redirects with POST/PUT and 100-Continue is coming shortly.
Comment 44 T.J. Purtell 2014-02-20 13:39:53 UTC
Thanks Martin!  This is excellent news. :)  I look forward to deploying it ASAP.
Comment 45 T.J. Purtell 2014-02-27 13:48:16 UTC
Which releases is the available in? I am itching to try it out.
Comment 47 T.J. Purtell 2014-03-06 17:06:49 UTC
I would like to alert you to one other potential issue to be aware of which isn’t a problem with the implementation.  On iOS, if your app enters the background, iOS will **close** sockets that you leave open.  This can lead to very unexpected behavior even with the native Apple networking interfaces, for example, callbacks for a request timing out will never get called if the request’s connection was closed by iOS.  I generally handle this by ensuring that I close all my background connections before the expiration time proscribed by iOS.

I am not quite sure how to do this with the .NET HttpClient API because the connection group name is obscured, so I can’t just lookup all the relevant service points and close the connection groups.  I think the approach I am going to try is to keep track of all ServicePoints engaged via my own HttpClientHandler.  Then when I need to shutdown the background activity, (e.g. close all persistent connections), I will iterate over these requests and set the MaxIdle time to 1ms.  Hopefully this will cause the connections to be closed out.  Then on resuming from background, I will reset the IdleTime to 100s.  Failing that, I might use some reflection to find the internal connection group name and close the connections.

It's an interesting issue because before the connections were being fully reused, there would be no chance of iOS closing an socket which might be reused.  However, I suppose it could have led to instances of a close being called on a bad fd.
Comment 48 T.J. Purtell 2014-03-06 17:15:18 UTC
Hmm, actually it seems like messing with IdleTime is not the way to go (connections aren't closed until try to use them).  So, I'll eventually try figuring out the connection group name and see how that works.

http://msdn.microsoft.com/en-us/library/system.net.servicepoint.maxidletime(v=vs.110).aspx

"When the MaxIdleTime for a connection associated with a ServicePoint is exceeded, the connection remains open until the application tries to use the connection. At that time, the Framework closes the connection and creates a new connection to the remote host"
Comment 49 Miguel de Icaza [MSFT] 2014-03-15 22:43:22 UTC
TJ,

I do not believe that the OS performs a "close()" on the socket connections of the process, that would be very nasty.   What might be happening is that the sockets time out and the kernel flags them as having their connections severed, so any new attempt to use the file descriptor would produce an error.
Comment 50 T.J. Purtell 2014-03-16 15:44:56 UTC
Thanks Miguel.  I think you are right.

I was a bit misled by the error code I had observed (EBADF) investigating background network connection issues in the past.  I turns out that Apple provides some information on this which I hadn't seen before.

https://developer.apple.com/library/ios/technotes/tn2277/_index.html

"Note: When your app resumes execution the actual error returned by a socket's whose resources have been reclaimed is purposely not specified here to allow for future refinements. However, in many cases the error will be EBADF, which is probably not what you were expecting! Under normal circumstances EBADF means that the app has passed an invalid file descriptor to a system call. However, in the case of a socket whose resources have been reclaimed, it does not mean that the file descriptor was invalid, just that the socket is no longer usable.
"

From my observations, I see some jammed network connection issues after running in the background.  A request or two will hang forever, but other requests will be able to proceed (presumably because they get new connections allocated to them).

I think this means one of two possibilities

(1) Mono/HttpWebRequest doesn't handle EBADF correctly on some of its calls to underlying APIs.
(2) Mono/HttpWebRequest is using a notification mechanism (e.g. select/kqueue) to get events for the socket, but that interface never delivers a notification that the connection is ready for write or that an error has occurred, so some request is assigned to a connection but never proceeds.

... this is just a hypothesis ...

I think the second case is most likely, in particular, from the Apple doc it sounds as if the socket will have its "data available for read" flag set but not its "error flag" set.  

"If you're using an NSStream, you will receive an NSStreamEventHasBytesAvailable event. You should respond to this by reading from the stream, at which point -[NSInputStream -read:maxLength:] will return -1, indicating an error", 

Since there is no pending read on a WebConnection when a new HTTP request is sent, the error state will never be observed for the socket.  Instead the request stays in limbo hung on that connection.
Comment 51 T.J. Purtell 2014-03-22 16:22:05 UTC
Since I am always using the Mono version of System.Net.Http.HttpClient to do my network requests, I use a custom HttpClientHandler as a workaround which appears to avoid jammed connections.  I still need to test against iOS < 7.1 on a device, since they could have adjusted their behavior with 7.1.

>public class MonoStallableHttpClientHandler : HttpClientHandler, IStallable
>{
>    private bool _Stall;
>    private readonly object _Lock = new object();
>    private HashSet<ServicePoint> _ServicePoints = new HashSet<ServicePoint>(); 
>
>    protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request,
>        CancellationToken cancellationToken)
>    {
>        lock (_Lock)
>        {
>            while (_Stall)
>                Monitor.Wait(_Stall);
>            var sp = ServicePointManager.FindServicePoint(request.RequestUri);
>            //TODO: redirects?
>            _ServicePoints.Add(sp);
>            return base.SendAsync(request, cancellationToken);
>        }
>    }
>
>    public bool Stall
>    {
>        get
>        {
>            return _Stall;
>        }
>        set
>        {
>            lock (_Lock)
>            {
>                _Stall = value;
>                if (!_Stall)
>                {
>                    Monitor.PulseAll(_Lock);
>                }
>                else
>                {
>                    foreach (var sp in _ServicePoints)
>                        sp.CloseConnectionGroup("HttpClientHandler");
>                    _ServicePoints.Clear();
>                }
>            }
>        }
>    }
>}
Comment 52 Koby 2014-03-23 04:45:34 UTC
Are there any news regarding this bug?
Comment 53 Dylan 2014-03-27 11:46:34 UTC
Also interested in finding out more information on this.
Comment 54 Tom_Qv 2014-04-10 16:58:46 UTC
I'm busy writing an app which is downloading OpenStreetMap-Maptiles (png's)
At least 20% of my HttpWebRequests are answered with a NameResolutionFailure-Exception.

Any progress on this?

Tom
Comment 55 Dylan 2014-04-10 17:01:39 UTC
I've resorted to resolving the DNS once (and retry a few times) and use the IP address for all future requests. Something definitely going on. Mainly happens for me on Android.
Comment 56 Koby 2014-04-13 08:52:06 UTC
This bug is opened for over 6 months and is a show stopper many people, what is going on Xamarin please give some feedback....
Comment 57 Miguel de Icaza [MSFT] 2014-04-14 11:48:19 UTC
The update was posted back in February, comment #42.

The change is available on Alpha channels. moving to betas, moving to stables.
Comment 58 Koby 2014-05-10 05:54:12 UTC
This issue still exists after updating to Xamarin for Visual Studio 1.12.278 (Latest stable) and even after trying Xamarin for Visual Studio 2.00.84.0 (Latest beta). 

I'v checked on Galasy S3 and Nexus 5 both real machines (also happens in genymotion simulator - checked on Galaxy S3 android 4.3).
After retrying few times it eventualy works, but it forces me to sleep between calls for few hundred miliseconds or else it will keep throwing this error.

When it happens on critical services, it really affect app start time.
Comment 59 Andrei.N 2014-07-27 07:29:31 UTC
I am running the last stable Xamarin Android and I just got this strange behavior too for the first time now.
It never occurred to me after running HttpClient many times during last year.

This bug was not posted in February, it dates back from August 2013: https://bugzilla.xamarin.com/show_bug.cgi?id=13750

I see the status is RESOLVE FIXED which it shouldn't.
Comment 60 La Nap 2014-09-08 08:44:09 UTC
The same issue was happening trying to consume web services. Installed latest Xamarin IOS 7.4 and Studio 5.3 hoping it was fixed but still happening! Will this ever be fixed? Trying to invoke a soap/wsdl web service fails with 'System.net.webexception error: NameResolutionFailure'
Please help!
Comment 61 Andrei.N 2014-09-08 10:14:52 UTC
Here's the behavior I've seen on Android:
- this seems to happen only in debug mode and for the first call.
- thankfully, in Release mode it doesn't happen.

But I wonder why is this is marked as resolved if people are still reporting it