Bug 34237 - Parallel.For performing degraded from 3.12 to 4.x
Summary: Parallel.For performing degraded from 3.12 to 4.x
Status: RESOLVED ANSWERED
Alias: None
Product: Installers
Classification: Mono
Component: General ()
Version: 4.2.0 (C6)
Hardware: PC Mac OS
: Normal normal
Target Milestone: ---
Assignee: Ludovic Henry
URL:
Depends on:
Blocks:
 
Reported: 2015-09-24 09:24 UTC by Chris Hamons
Modified: 2015-10-11 17:08 UTC (History)
3 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED ANSWERED

Description Chris Hamons 2015-09-24 09:24:56 UTC
See the discussion here:

http://forums.xamarin.com/discussion/comment/152874#Comment_152874

I have a full sample from the customer, who would like to keep it under wraps. Poke me for access.
Comment 1 Chris Hamons 2015-09-24 09:35:45 UTC
Possibly related / same as https://bugzilla.xamarin.com/show_bug.cgi?id=19015
Comment 2 Ludovic Henry 2015-09-30 10:09:58 UTC
Hi Chris,

I looked at the user test case, and multiple things seemed pretty obvious to me:
 - first there would seem to be a bug in the Parallel.For imported from microsoft referencesource: without a ParallelOption with MaxDegreeOfParallelism passed, it is going to use a ludicrous amount of threads.
 - second, the user code contains a bunch of threading misusage which is going to lead to scalability issue after just a few threads

I am going to work on the Parallel.For issue, but unfortunately I cannot work on the user code error.

Also, I reassigned the bug to me, as I think Rodrigo is already getting enough notifications as is :)

Thank you,

Ludovic
Comment 3 Ludovic Henry 2015-09-30 11:45:38 UTC
According to the documentation (https://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx), the number of parallel tasks is not bounded in the default case, so the fact that it is going to launch an unlimited/huge number of tasks (and thus of threads) in parallel, is the expected behavior.

To control the number of parallel tasks executing in parallel, simply pass a ParallelOptions with MaxDegreeOfParallelism set. The user already have it in the code (https://github.com/philstopford/pointless/blob/master/MainWindowController.cs#L269), he just doesn't pass it to Parallel.For (https://github.com/philstopford/pointless/blob/master/MainWindowController.cs#L292).

So in the end, "this is not a bug, this is a feature" ...
Comment 4 phil.stopford 2015-10-02 13:30:30 UTC
I'm curious. This is my first foray into MonoMac code, but the original code ran fine and without any apparent ill effects on the Windows side. Even with the ParallelOptions based to Parallel.For, I see no change in behavior on Windows or the issues on the Mac side.

What threading misusage is apparent in the code (just vague hints would be sufficient)?
Comment 5 phil.stopford 2015-10-02 13:30:52 UTC
*passed to Parallel.For
Comment 6 phil.stopford 2015-10-02 13:34:06 UTC
It's also unclear to me why 3.12.1 and 4.x behave so differently.
Comment 7 Ludovic Henry 2015-10-02 13:42:59 UTC
The main change between mono 3.12 and 4.x is that we started using Microsoft ReferenceSource starting at 4.0. And the Mono version (before 4.0) of Parallel.For was a bit more respectful of the user machine by limiting by default the maximum number of parallel tasks to the number of CPU on the user machine. The implementation of Microsoft does not really care about that, and simply launch as many tasks in parallel in possible, letting the OS scheduler and the ThreadPool take care of the rest.

The fact that it works better on Windows is more or less an issue with the OS scheduler, and a hint of luck with the ThreadPool. And if you are comparing to .NET on windows, then .NET uses the OS ThreadPool, while we use the runtime one on Mono, and the OS one has way more insights into the resource usage of the process.

My main advice would be to pass a ParallelOption with MaxDegreeOfParallelism set to N * the number of core, with N a acceptable number for your workload (2 for CPU heavy workload, to 20-50 for IO heavy workload)
Comment 8 phil.stopford 2015-10-02 13:49:26 UTC
Well, as noted, I added the po 'ParallelOptions' to the Parallel.For and it seemed to make no difference in the overall performance on either Windows or OS X, hence my confusion (for the large project supplied and also my basic testbed to try and understand). Were you able to get full CPU usage with 'po' in the small test case?

e.g. Parallel.For(0, numberOfCases, po, (i, loopState) =>

I only ever see ~140% CPU usage and the timer for updating fires very infrequently in both cases. This has me rather puzzled, and is also seen with the terminal (headless) behavior, where GUI aspects are quite simplified

I tried various locking strategies as well, without seeing any full usage. That's why I'm curious about the apparent threading misusage :)
Comment 9 Ludovic Henry 2015-10-02 14:01:44 UTC
The not full CPU usage comes from you algorithm which is not parallel-computation friendly: most of the time is spent in waiting for the lock at https://github.com/philstopford/pointless/blob/master/MainWindowController.cs#L299

And increasing the number of threads will only lead to more contention on this lock. By only seeing the small test case in this repository, you would certainly get better performance by having a single threaded version, and no lock.

Your lock contention comes from the fact that you have this shared `previewShape` field that is accessed by all the threads.

Another concurrency pattern that you should have a look into is the System.Threading.Barrier [0].

[0] https://msdn.microsoft.com/en-us/library/system.threading.barrier(v=vs.110).aspx
Comment 10 phil.stopford 2015-10-02 14:05:04 UTC
Well, this small case was intended to explore the issues I was seeing in the full example (that Chris has and can be poked for), where the computation time is significantly longer. This report was really in the context of that larger example, which runs pretty nicely (headless) under 3.12.1 and poorly under 4.x.

The same basic code is used in the MonoMac version (in that full example) and there the GUI performance during a run seems to be horrible.
Comment 11 phil.stopford 2015-10-11 17:08:05 UTC
More work on my side has yielded these numbers for a headless version :

Runtime			Iterations

OSX_Headless46 (Mono 4.3.0)
 1 m, 03 s		:	65000
 2 m, 31 s 	:	130000

OSX_Headless46 (Mono 4.2.1)
 1 m, 03 s		:	65000
 2 m, 31 s 	:	130000

OSX_Headless46 (Mono 3.12.1)
 1 m, 31 s 	:	65000
 3 m, 21 s 	:	130000

However, compared to Windows, something is still seriously amiss - the Mono best-case is still 2x slower than the Windows version.

 0 m, 30 s	:	65000