Notice (2018-05-24): bugzilla.xamarin.com is now in
Please join us on
Visual Studio Developer Community and in the
Mono organizations on
GitHub to continue tracking issues. Bugzilla will remain
available for reference in read-only mode. We will continue to work
on open Bugzilla bugs, copy them to the new locations
as needed for follow-up, and add the new items under Related
Our sincere thanks to everyone who has contributed on this bug
tracker over the years. Thanks also for your understanding as we
make these adjustments and improvements for the future.
Please create a new report on
GitHub or Developer Community with
your current version information, steps to reproduce, and relevant error
messages or log files if you are hitting an issue that looks similar to
this resolved bug and you do not yet see a matching new report.
Created attachment 2306 [details]
Program to read and sort the list of ints in the sample data included.
The optimizations made in https://github.com/mono/mono/commit/d97cdb0c124729152be551c421c4a11732e45fc9, appear to have introduced the opposite effect when there are lots of duplicates in the array. If running on a ThreadPool thread, it also introduces a StackOverflowException when trying to sort large arrays with lots of duplicates. If you compile and run the attached program and sample of real data from our program, there are two issues: 1) the data, which used to take 00.038 seconds to sort now takes 17 seconds to sort, orders of magnitude slower; 2) if you try to sort the data on a ThreadPool thread as we do in our application, it will result in a StackOverflowException being thrown.
Submitted pull request that reverts to previous implementation, adds back the insertion sort optimization, and only recurses on the smaller partition. https://github.com/mono/mono/pull/421
Jeff, could you look into it. I don't remember why the change was made.
committed a partial fix for this which prevents the StackOverflowExceptions
I'll look into the performance issue when I have more free time (critical issue just popped up thanks to Apple changing the file format of their PList files)
Just committed your patch to add the insertion sort fallback to the non-generic qsort() function.
Could you explain how your approach to partitioning is better than the current approach? I tried looking over it, but all of the renaming of variables makes it hard to clearly see the differences.
Okay, now that the binary plist crisis has been averted, I've taken the time to examine your patch and test it out.
I was about to suggest using while (i < k ...) and while (k > i ...) logic instead of while (i < high ...) and while (k > low ...) like I had in our current implementation, but then I noticed you updated your git repo and made that change already :-)
Come to think of it, we probably don't need those constraints at all anymore now that both loops will exit as soon as they match an item with an identical key to the pivot. Will have to test this, but I think it's another optimization we can make now...
That said, I've committed your collapse-the-walls loop fixes and performance of your test program is once again ~0.032s (which matches the performance in Mono 2.10 for me)
FWOW, here's the old thread with the reasoning behind the previous implementation in case you were interested:
Thank you for the reference; I was looking through my email trying to find the original conversation as I had remembered seeing something about QuickSort on the mailing list some time ago, but apparently did not go back far enough.
As part of my testing, I was using a Windows box running the sort code in a stand-alone program with qsort<int, int> (int,int,int,int) and its helper methods so I could compare the timing results to using the Array.Sort method from Microsoft's .NET framework. Sorting an array of ints was faster, but converting the sample data to an array of strings produced similar results to Microsoft's version.
Out of curiosity, I then ran the stand-alone program on my Mac (a new/faster machine) using Mono 2.11.2 and was surprised by the results. Running qsort with an array of ints was about the same on both machines; however, sorting an array of strings produced much worse result, sometimes 10 to 20 times slower. This seems to indicate that (at least for strings) there is a big difference in the comparison cost/performance between Microsoft and Mono.
Rodrigo suggested I implement a radix sort for strings, I just haven't gotten around to it yet. That might be one way to improve things, but from what you discovered, it sounds like you are right, that Mono's string comparison logic might be a better place to look into resolving since your tests suggest that Microsoft's Array.Sort() on strings also uses a qsort approach.
oh, I think I misread your comment above. I guess you mean you compared Mono's qsort on Mac vs the same qsort on Windows (using the .NET runtime).