Bug 20 - System.Diagnostics.Process leaves zombie processes behind
Summary: System.Diagnostics.Process leaves zombie processes behind
Status: RESOLVED FIXED
Alias: None
Product: Class Libraries
Classification: Mono
Component: System ()
Version: 2.10.x
Hardware: PC Linux
: --- minor
Target Milestone: Untriaged
Assignee: Mono Bugs
URL:
Depends on:
Blocks:
 
Reported: 2011-07-18 23:50 UTC by Darren
Modified: 2011-07-27 20:54 UTC (History)
3 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Small process forking test project in F#/C++ (26.97 KB, application/octet-stream)
2011-07-27 20:54 UTC, Darren
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Darren 2011-07-18 23:50:16 UTC
I ported a large system written in F# from windows over to linux recently with very little pain thanks to mono.  The one small but kind of crippling problem is with child process management.  I typically run a few thousand batch subtasks using System.Threading.Task to parallelize the jobs which take 1-10 seconds and System.Diagnostics.Process to run them.  Most of them exit correctly but a few will finish running, and get to a zombie status, presumably because the parent process hasn't correctly recevied the SIGCHLD event.   The code does a WaitForExit() which does complete (the process has exitted), but then an attempt to get the ExitPid fails with InvalidOperation.  My attempts to delay and try again are shown below (in F#).   I've tried a lot of waiting but the child exit pid never becomes available and the actual process is in a zombie state.   The function below is wrapped in a task factory.   It might be possible to write something more compact that exercises the bug efficiently.  The whole system is pretty large, but it looks to me <SPECULATE> as though SIGCHLD is getting ignored and the kernel isn't going to release the process and give an exit code till that gets heard </SPECULATE>  I haven't found the right place in the mono source to look but would be happy to experiment if someone could even point me in the right direction.

I noticed some attempt to fix this and tried 10.2.2 hoping for an improvement but it behaves the same way,

any pointers appreciated

Darren


// F# code for process execution
open System.Diagnostics
let runProc2 path args =
    use p = new Process()
    //stdout.WriteLine(sprintf "runProc: path=%s args=%s" path args)
    p.StartInfo.FileName <- path
    p.StartInfo.Arguments <- args
    p.StartInfo.CreateNoWindow <- false
    p.StartInfo.RedirectStandardError <- true
    p.StartInfo.RedirectStandardOutput <- true
    //p.StartInfo.WindowStyle <- System.Diagnostics.ProcessWindowStyle.Hidden
    p.StartInfo.WindowStyle <- System.Diagnostics.ProcessWindowStyle.Maximized
    p.StartInfo.UseShellExecute <- false
    if not (p.Start()) then
        999,"","Failed to exec"
    else
        stdout.Flush()
        
        //stdout.WriteLine("Process exitted")
        //stdout.Flush()
        let stdErr = p.StandardError.ReadToEnd()
        let stdOut = p.StandardOutput.ReadToEnd()
        //stdout.WriteLine("Process exitted")
        //stdout.Flush()
        p.WaitForExit()

        let exitCode =
            try
                p.ExitCode
            with
                | :?  System.InvalidOperationException ->
                    printf "WARNING: InvalidOperation Exception encountered, waiting 5 seconds\n"
                    // might be a race condition with process exiting, be patient
                    Thread.Sleep(5000)
                    printf "Waiting again\n"
                    p.WaitForExit()
                    printf "Gathering exit code\n"
                    p.ExitCode
                   
        exitCode,stdOut,stdErr
Comment 1 Robert Jordan 2011-07-27 10:23:24 UTC
Try to call p.Dispose() when you're ready with "p". In C# I'd use a "using(){}" block for this.

While this doesn't fix the ExitCode issue directly, it would still help, given that you're "run[ning] a few thousand batch subtasks". Each undisposed Process object keeps an internal handle open, and they pile up non-deterministically.
Comment 2 Rolf Bjarne Kvinge [MSFT] 2011-07-27 15:46:04 UTC
Waiting for processes has been improved greatly already, but the code is not on the 2.10 branch, it will be in the upcoming 2.12.

If you want to experiment with the code have a look at the log of the file mono/io-layer/processes.c in master:

https://github.com/mono/mono/commits/master/mono/io-layer/processes.c

The relevant commits are my commits in February/March.
Comment 3 Darren 2011-07-27 20:40:00 UTC
(In reply to comment #1)
> Try to call p.Dispose() when you're ready with "p". In C# I'd use a "using(){}"
> block for this.
> 
> While this doesn't fix the ExitCode issue directly, it would still help, given
> that you're "run[ning] a few thousand batch subtasks". Each undisposed Process
> object keeps an internal handle open, and they pile up non-deterministically.

The 'use' statement in F# is the equivalent of using, so it should be disposing properly, but good point.
Comment 4 Darren 2011-07-27 20:54:31 UTC
Created attachment 28 [details]
Small process forking test project in F#/C++

I wrote a small F# project to simulate the process forking.  It is a primary spawn process (WaitBug) running a smaller test process (waiter.cc or waiter.exe) to simulate different exit dynamics.   I was sadly unable to reproduce the fault I see in my production code using this simulation but it might still be useful for testing and it did throw other errors once the load was high enough (longer running processes with more child processes), so it might be useful test code.

Darren