Bug 21818 - calls waitpid(-1, ...) which is incompatible with g_spawn_sync(), and catches SIGCHLD which is incompatible with other GLib things
Summary: calls waitpid(-1, ...) which is incompatible with g_spawn_sync(), and catches...
Status: RESOLVED FEATURE
Alias: None
Product: Runtime
Classification: Mono
Component: General ()
Version: 3.2.x
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2014-08-05 08:46 UTC by Simon McVittie
Modified: 2017-10-19 19:34 UTC (History)
4 users (show)

Tags: bugpool-archive
Is this bug a regression?: ---
Last known good build:


Attachments
Global SIGCHILD handler should only reap known mono subprocesses (2.07 KB, patch)
2014-08-06 11:54 UTC, Simon McVittie
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FEATURE

Description Simon McVittie 2014-08-05 08:46:31 UTC
I wanted to spawn a subprocess in a C# application that already uses GLib functions, without being bitten by command-line escaping. "Oh, that's easy", I thought, "I'll call g_spawn_sync()".

Unfortunately, since commit afb1937e561 in 2011, Mono installs a SIGCHLD handler which calls waitpid(-1, ...).

This is incompatible with anything in GLib which wants to install a SIGCHLD handler, meaning g_child_watch_source_new(), which is called by the g_spawn_async*() family. because there can only be one (it is technically possible to "stack" handlers, but it's dangerous because you don't know which calling convention they are expecting, and neither Mono nor GLib actually does this).

It is also documented to be incompatible with g_spawn_sync(), which produces this warning:

  GLib-WARNING **: In call to g_spawn_sync(), exit status of a child process was requested but ECHILD was received by waitpid().

More concretely, the problem is: suppose the process started by g_spawn_sync() is pid 12345. Before the thread that called g_spawn_sync() is about to block in waitpid(12345, ...), some other thread receives SIGCHLD, calls waitpid(-1, ...), and reaps process 12345. Now the g_spawn_sync() thread gets scheduled, calls waitpid(12345, ...) and receives ECHILD.

I think the solution for g_spawn_sync is for the SIGCHLD handler to iterate through the pids in which Mono is actually interested, and waitpid() for each individual pid. Here is an untested sketch of a patch:

-	do {
+	for (p = mono_processes; p != NULL; p = p->next) {
+		if (p->pid <= 0)
+			continue;
+
 		do {
-			pid = waitpid (-1, &status, WNOHANG);
+			pid = waitpid (p->pid, &status, WNOHANG);
 		} while (pid == -1 && errno == EINTR);
 
-		if (pid <= 0)
-			break;
-
+		if (pid > 0) {
 #if DEBUG
-		fprintf (stdout, "child ended: %i", pid);
+			fprintf (stdout, "child ended: %i", pid);
 #endif
-		p = mono_processes;
-		while (p != NULL) {
-			if (p->pid == pid) {
-				p->pid = 0; /* this pid doesn't exist anymore, clear it */
-				p->status = status;
-				MONO_SEM_POST (&p->exit_sem);
-				break;
-			}
-			p = p->next;
+			p->pid = 0; /* this pid doesn't exist anymore, clear it */
+			p->status = status;
+			MONO_SEM_POST (&p->exit_sem);
 		}
-	} while (1);
+	}


I don't think there is any solution for g_child_watch_source_new() and the g_spawn_async*() family - SIGCHLD and waitpid() are awful APIs in the presence of libraries or threads, and it isn't clear to me how Mono could do a whole lot better. If anyone has good ideas...
Comment 1 Simon McVittie 2014-08-06 11:54:51 UTC
Created attachment 7603 [details]
Global SIGCHILD handler should only reap known mono  subprocesses

From: Vivek Dasmohapatra <vivek@collabora.co.uk>

---

Here is a more tested patch based on my suggestion, from my colleague Vivek.
Comment 2 Rodrigo Kumpera 2014-08-12 20:17:16 UTC
I'm not happy with this approach as it does make child heaping significantly more expensive as the number of syscalls is proportional to the set of children and not constant.

But, said that, posix doesn't give us better alternatives.

Could you guys license the above patch under the MIT license?

Once licensing is ok, it can be merged into master.

Thanks for the patch guys.
Comment 3 Rodrigo Kumpera 2014-08-26 00:09:59 UTC
Could you guys fix the patch licensing?
Comment 4 Simon McVittie 2014-08-26 06:14:16 UTC
> Could you guys fix the patch licensing?

I hope we can, and I'd license it MIT/X11 or CC0 immediately if it was only up to me; but we need to check some contractual stuff to confirm that we can do this. Sorry for the delay, I'll remind people to act on it.
Comment 5 Simon McVittie 2014-08-26 14:23:05 UTC
Yes, MIT/X11 is fine. More formally:

Vivek and I did the patches on this bug on behalf of Collabora Ltd. They are copyright © 2014 Collabora Ltd. and may be merged under the MIT/X11 (Expat) license as described in mcs/MIT.X11.
Comment 6 Simon McVittie 2014-08-26 14:26:06 UTC
> I'm not happy with this approach as it does make child heaping 
> significantly more expensive as the number of syscalls is proportional
> to the set of children and not constant.

I'm not happy with it either, but it's the least bad thing we could think of given the constraints of POSIX process semantics, and GLib uses a similar implementation.

In practice, if you have n child processes for large n, I suspect that O(n) iteration per SIGCHLD is not going to be your performance bottleneck: the fork() is going to be more expensive than traversing linked lists.
Comment 7 Rodrigo Kumpera 2017-09-13 18:07:19 UTC
We don't plan on changing how mono process reaping code works.