Bug 37583 - MAJOR performance decrease between 4.0.5.1 and 4.2.1
Summary: MAJOR performance decrease between 4.0.5.1 and 4.2.1
Status: RESOLVED FIXED
Alias: None
Product: Class Libraries
Classification: Mono
Component: mscorlib ()
Version: 4.2.0 (C6)
Hardware: PC Mac OS
: --- normal
Target Milestone: Untriaged
Assignee: Marek Safar
URL:
Depends on:
Blocks:
 
Reported: 2016-01-11 18:57 UTC by Jeffrey Stedfast
Modified: 2016-01-14 13:41 UTC (History)
4 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
mbox-parser.cs (880 bytes, text/plain)
2016-01-11 18:57 UTC, Jeffrey Stedfast
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Jeffrey Stedfast 2016-01-11 18:57:50 UTC
Created attachment 14539 [details]
mbox-parser.cs

I was testing performance of MimeKit's parser this weekend and I noticed a *major* decrease in performance since the last time I ran my benchmarks a year or so ago.

To narrow down the problem, I reverted my MimeKit checkout to various points between now and 1.0 and noticed that the was very little difference. At this point I was began to suspect Mono, so I downloaded Mono 4.0.5.1 to check the performance of MimeKit's parser using that.

Here are the results (Mac OS X Yosemite 10.10.5):

Mono JIT compiler version 4.0.5 ((detached/1d8d582 Tue Oct 20 15:15:33 EDT 2015)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           normal
	SIGSEGV:       altstack
	Notification:  kqueue
	Architecture:  x86
	Disabled:      none
	Misc:          softdebug 
	LLVM:          yes(3.6.0svn-mono-(detached/a173357)
	GC:            sgen
[fejj@localhost playground]$ mono --llvm mbox-parser.exe larger.mbox 
Parsed 14896 messages in 1.78 seconds.
[fejj@localhost playground]$ mono mbox-parser.exe larger.mbox 
Parsed 14896 messages in 2.37 seconds.



Here are the results for the current stable release of Mono:

Mono JIT compiler version 4.2.1 (explicit/6dd2d0d Fri Nov  6 12:25:19 EST 2015)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           normal
	SIGSEGV:       altstack
	Notification:  kqueue
	Architecture:  x86
	Disabled:      none
	Misc:          softdebug 
	LLVM:          yes(3.6.0svn-mono-(detached/a173357)
	GC:            sgen
[fejj@localhost playground]$ mono --llvm mbox-parser.exe larger.mbox 
Parsed 14896 messages in 5.11 seconds.
[fejj@localhost playground]$ mono mbox-parser.exe larger.mbox 
Parsed 14896 messages in 5.45 seconds.


That's a pretty significant decrease in performance.

Attached you'll find my mbox-parser.cs test program.

To get MimeKit:
git clone https://github.com/jstedfast/MimeKit.git

I can't attach the sample mbox that I was using (because it's over a gigabyte), but you can easily recreate it by concatenating multiple copies of MimeKit/UnitTests/TestData/mbox/jwz.mbox.txt into a single file (approx. 100 times). That should give you a large enough input to rule out statistical noise.
Comment 1 Zoltan Varga 2016-01-11 21:35:01 UTC
Here is what the profiler outputs:

  200089        0          1 (wrapper runtime-invoke) <Module>:runtime_invoke_void_object (object,intptr,intptr,intptr)
  200089       11          1 MboxParser.Program:Main (string[])
  199843        7      12160 MimeKit.MimeParser:ParseMessage (System.Threading.CancellationToken)
  199836       33      12160 MimeKit.MimeParser:ParseMessage (byte*)
  161829       21      11440 MimeKit.MimeParser:ConstructMultipart (MimeKit.Multipart,byte*)
  160000       59      11440 MimeKit.MimeParser:MultipartScanSubparts (MimeKit.Multipart,byte*)
  148425      417    1216769 System.OrdinalComparer:GetHashCode (string)
  148007      423    1216766 System.Globalization.TextInfo:GetHashCodeOrdinalIgnoreCase (string)
  147584      602    1216766 System.Globalization.TextInfo:GetHashCodeOrdinalIgnoreCase (string,bool,long)
  146812      430    1217122 System.Globalization.TextInfo:GetCaseInsensitiveHashCode (string,bool,long)
  146381      452    1217122 System.CultureAwareComparer:GetHashCode (string)
  145928      423    1217122 System.Globalization.CompareInfo:GetHashCodeOfString (string,System.Globalization.CompareOptions)
  145504      596    1217122 System.Globalization.CompareInfo:GetHashCodeOfString (string,System.Globalization.CompareOptions,bool,long)
  144631      445    1217122 System.Globalization.CompareInfo:GetSortKey (string,System.Globalization.CompareOptions)
  144186      421    1217122 System.Globalization.CompareInfo:CreateSortKey (string,System.Globalization.CompareOptions)
  143765      775    1217122 System.Globalization.CompareInfo:CreateSortKeyCore (string,System.Globalization.CompareOptions)
  142538      407    1217122 Mono.Globalization.Unicode.SimpleCollator:GetSortKey (string,System.Globalization.CompareOptions)
  142130     1027    1217122 Mono.Globalization.Unicode.SimpleCollator:GetSortKey (string,int,int,System.Globalization.CompareOptions)
  127447    11392    1217122 Mono.Globalization.Unicode.SimpleCollator:GetSortKey (string,int,int,Mono.Globalization.Unicode.SortKeyBuffer,System.Globalization.CompareOptions)
   98029      361     278400 MimeKit.HeaderList:Add (MimeKit.Header)
   83428      391      21920 MimeKit.MimeMessage:.ctor (MimeKit.ParserOptions,System.Collections.Generic.IEnumerable`1<MimeKit.Header>)
   68671    14597   11582357 Mono.Globalization.Unicode.SimpleCollator:FillSortKeyRaw (int,Mono.Globalization.Unicode.SimpleCollator/ExtenderType,Mono.Globalization.Unicode.SortKeyBuffer,System.Globalization.CompareOptions)
   46289       34      76242 MimeKit.MimeParser:Step (byte*)
   46227     1309      64080 MimeKit.MimeParser:StepHeaders (byte*)
   43171       23       9760 MimeKit.MimeParser:ConstructMessagePart (MimeKit.MessagePart,byte*)
   42115      218     343680 MimeKit.MimeParser:ParseAndAppendHeader ()
Comment 2 Jeffrey Stedfast 2016-01-11 21:52:55 UTC
I guess the problem is that the code ends up here?

        [System.Security.SecuritySafeCritical]  // auto-generated
        internal unsafe int GetCaseInsensitiveHashCode(String str, bool forceRandomizedHashing, long additionalEntropy)
        {
            // Validate inputs
            if (str==null) 
            {
                 throw new ArgumentNullException("str");
            }
            Contract.EndContractBlock();

#if MONO
            return StringComparer.CurrentCultureIgnoreCase.GetHashCode (str);
#else
            // Return our result
            return (InternalGetCaseInsHash(this.m_dataHandle, this.m_handleOrigin, this.m_textInfoName, str, forceRandomizedHashing, additionalEntropy));
#endif
        }


I'm betting that the StringComparer.CurrentCultureIgnoreCase.GetHashCode() is insanely expensive.

FWIW, GetCaseInsensitiveHashCode() is effectively called from here:

        [ResourceExposure(ResourceScope.None)]
        [ResourceConsumption(ResourceScope.Process, ResourceScope.Process)]
        internal static int GetHashCodeOrdinalIgnoreCase(String s, bool forceRandomizedHashing, long additionalEntropy)
        {
            // This is the same as an case insensitive hash for Invariant
            // (not necessarily true for sorting, but OK for casing & then we apply normal hash code rules)
            return (Invariant.GetCaseInsensitiveHashCode(s, forceRandomizedHashing, additionalEntropy));
        }


Notice that the comment says that it expects to use the InvariantCulture, not the CurrentCulture. This may make a difference. Of course, I'm not sure why they switch to Invariant seeing as how Ordinal should still be cheaper than Invariant?
Comment 3 Jeffrey Stedfast 2016-01-11 21:54:06 UTC
D'oh, never mind. I misread the code.
Comment 4 Marek Safar 2016-01-12 08:17:46 UTC
The issue should be addressed by https://github.com/mono/referencesource/commit/8382c4605d622d56354e7340bf872caaf36b972a

jeff, could you try your test with the latest mono or mono 4.3.2
Comment 5 Jeffrey Stedfast 2016-01-12 13:26:17 UTC
Unfortunately that change didn't seem to help much:

Mono JIT compiler version 4.3.2 (mono-4.3.2-branch/83e0157 Mon Jan 11 18:26:16 EST 2016)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           normal
	SIGSEGV:       altstack
	Notification:  kqueue
	Architecture:  x86
	Disabled:      none
	Misc:          softdebug 
	LLVM:          yes(3.6.0svn-mono-master/a173357)
	GC:            sgen
[fejj@localhost perf]$ mono --llvm ./mbox-parser.exe gigabyte.mbox 
Parsed 14896 messages in 10.819862s.
[fejj@localhost perf]$ mono ./mbox-parser.exe gigabyte.mbox 
Parsed 14896 messages in 10.593343s.
[fejj@localhost perf]$ mono --version
Mono JIT compiler version 4.0.5 ((detached/1d8d582 Tue Oct 20 15:15:33 EDT 2015)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           normal
	SIGSEGV:       altstack
	Notification:  kqueue
	Architecture:  x86
	Disabled:      none
	Misc:          softdebug 
	LLVM:          yes(3.6.0svn-mono-(detached/a173357)
	GC:            sgen
[fejj@localhost perf]$ mono --llvm ./mbox-parser.exe gigabyte.mbox 
Parsed 14896 messages in 5.737068s.
[fejj@localhost perf]$ mono ./mbox-parser.exe gigabyte.mbox 
Parsed 14896 messages in 6.912308s.


(testing this on my laptop which is slower, so posted timings of both 4.3.2 and 4.0.5 for comparison)
Comment 6 Marek Safar 2016-01-14 13:41:42 UTC
Fixed in master