Bug 27246 - System.Diagnostics.Process and Java.Lang.Process don't mix
Summary: System.Diagnostics.Process and Java.Lang.Process don't mix
Status: ASSIGNED
Alias: None
Product: Android
Classification: Xamarin
Component: BCL Class Libraries ()
Version: 4.20.0
Hardware: Other Other
: --- normal
Target Milestone: ---
Assignee: Marek Habersack
URL:
Depends on:
Blocks:
 
Reported: 2015-02-20 04:54 UTC by Rupert Rawnsley
Modified: 2016-08-31 14:48 UTC (History)
3 users (show)

Tags: XATriaged
Is this bug a regression?: ---
Last known good build:


Attachments
Example project (6.66 KB, application/zip)
2015-02-20 04:54 UTC, Rupert Rawnsley
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report for Bug 27246 on Developer Community or GitHub if you have new information to add and do not yet see a matching new report.

If the latest results still closely match this report, you can use the original description:

  • Export the original title and description: Developer Community HTML or GitHub Markdown
  • Copy the title and description into the new report. Adjust them to be up-to-date if needed.
  • Add your new information.

In special cases on GitHub you might also want the comments: GitHub Markdown with public comments

Related Links:
Status:
ASSIGNED

Description Rupert Rawnsley 2015-02-20 04:54:06 UTC
Created attachment 9940 [details]
Example project

[originally posted on the forum: http://forums.xamarin.com/discussion/32987/mono-process-bug]

Calling System.Diagnostics.Process.WaitForExit() shortly after a Java.Lang.Process has run often hangs despite the process having finished.

I've attached a simple demonstration Xamarin.Android project that exhibits this behaviour. It invokes the 'id' terminal command using System.Diagnostics.Process ten times, then it invokes the same command using Java.Lang.Process, then it repeats the System.Diagnostics.Process invocation ten times. Pressing the button on the UI will invoke this sequence of commands and the results will come out in logcat or the application output window.

The 'id' command is deterministic and we would expect the process calls to terminate quickly, however, about 20% of the time, one or more of the System.Diagnostic.Process calls hangs on WaitForExit (which has a five second timeout for this example). In all cases, the command appears to have actually completed because 'ps' does not report it being active and the input buffer contains the expected data and has been read to the end of the stream.

The failure mode only occurs if you have already run Java.Lang.Process and seems to sort itself out after a small number of failures (typically two). If you comment out the invocation of Java.Lang.Process, the problem never occurs.

Our working theory is that both methods for Process invocation rely on a common pool of resources, possibly handles, and that this pool becomes temporarily corrupted. It isn't obviously thread related as each invocation runs sequentially.

Other evidence of interest includes the occasional hang of Java.Lang.Process.WaitFor() under similar circumstances. It is commented out in the attached example project because the timeout available in WaitForExit makes for a cleaner demonstration. Add it back and you should see it hang with the same frequency as WaitForExit was experiencing timeouts. Similarly, removing calls to System.Diagnostics.Process make the problem go away, which implies it's something to do with the interplay between the both of them.

The problems occurs on at least five different Android devices using a range of processors and Android 4.2.2 and Android 5.0.1.

It should be noted that this hang is unlikely to be the "forgot to empty all the buffer streams" problem that is a common source of process hang behaviour because all the input and error streams are merged and the streams are read to EOF in all cases.

Here is the expected output:

TEST START ------------------
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS JAVA: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
TEST END ------------------

...and here is the common fail mode...

TEST START ------------------
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS JAVA: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
FAILURE MONO: TIMEOUT!
FAILURE MONO: TIMEOUT!
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
SUCCESS MONO: uid=10084(u0_a84) gid=10084(u0_a84) groups=1028(sdcard_r),3003(inet),50084(all_a84)
TEST END ------------------
Comment 1 Jonathan Pryor 2015-02-20 17:53:38 UTC
> Our working theory is that both methods for Process invocation rely on a common pool of resources

That's very probably true: they presumably both rely on the kernel's process table. ;-)

Having spent 5 minutes thinking about this...

Mono implements WaitForExit() by using waitpid(2) atop the SIGCHLD signal:

https://github.com/mono/mono/blob/9621471d3a8df35a0148164861081094c32d5c89/mono/io-layer/processes.c#L2546-2580

	pid = waitpid (-1, &status, WNOHANG);

A child process which has exited can only be waited upon *once*, ever, and using the wait4(2) family of functions is the only way to obtain process exit values (that I know of).

Which sets up our "common pool of resources" -- process identifiers!

Furthermore, the waitpid(2) is done within a SIGCHLD handler, and thus Mono's SIGCHLD will be invoked when *any* process exits...including (especially!) java.lang.Process exits, which sets us up for the this wonderful theoretical order of events:

1. create java.lang.Process
2. java.lang.Process completes, exits
3. Mono's SIGCHLD handler is raised, as the child has exited.
4. waitpid(2) is invoked, child collected
5. java.lang.Process-associated code never has a chance to collect anything regarding its own process.

Assuming the above is correct, I'm not even sure if this is *fixable*.

...so I'll punt it to @grendel. ;-)
Comment 2 Marek Habersack 2016-08-31 14:48:02 UTC
My first reaction is - not fixable, but I'll give it some thought once I get to this bug, maybe there is a way.