Bug 6198 - Certain Regexes have become extremely slow
Summary: Certain Regexes have become extremely slow
Status: RESOLVED FIXED
Alias: None
Product: Class Libraries
Classification: Mono
Component: System ()
Version: master
Hardware: PC Linux
: --- normal
Target Milestone: Untriaged
Assignee: Martin Baulig
URL:
Depends on:
Blocks:
 
Reported: 2012-07-19 10:46 UTC by Weeble
Modified: 2013-01-09 15:35 UTC (History)
3 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Suggested patch (1.46 KB, patch)
2012-07-26 18:16 UTC, Robert Wilkens
Details
Revised, not as fast, but more propper (779 bytes, patch)
2012-07-27 16:02 UTC, Robert Wilkens
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Weeble 2012-07-19 10:46:58 UTC
I've pasted some example code here:

http://pastebin.com/NCTa8R0h

I first observed this in YUI Compressor, which was running fine for me in Mono 2.10.5, but appeared to hang with 100% CPU when I switched to Mono 2.11.2. I've extracted and simplified the regex that was causing the problem, to something like this:

    (a|b)(c+e)

This shows terrible running time on Mono 2.11.2 on strings of the form:

    acccccccccX accccccccX acccccccX...

The thing is, I can't see any reason for it to blow up while backtracking. It looks for "a" and finds it. It then looks for "c" and finds it, then "c" or "e" repeatedly until it hits the "X", at which point it fails and has to backtrack. But there's nothing else to explore while backtracking. It should go all the way back up until it fails to find any match at position 0 and then advances to continue from position 1, quickly failing at every position until the next "a".

This equivalent Regex does not show the massive slowdown:

    ([ab])(c+e)

I tried both Mono 2.10.5 (Debian package) and Mono 2.11.2 (tarball), and for each I tried with and without MONO_NEW_RX=1. The only combination that is incredibly slow is Mono 2.11.2 *without* MONO_NEW_RX=1.

Given that this regression is recent and linked to the "|" operator, I think it's probably to do with this fix for bug 2663:

https://github.com/mono/mono/commit/60a463c6f3164784939cd9535acf865f6b33ef20
https://bugzilla.xamarin.com/show_bug.cgi?id=2663

I'm not familiar enough with the regex implementation to understand what's going on in there.
Comment 1 Robert Wilkens 2012-07-25 08:57:44 UTC
I want to fix this one, it's probably easy, but I won't be available to do this until at least august 1st (about a week away).  My Linux install is corrupt, and I am away from home.
Comment 2 Robert Wilkens 2012-07-25 10:26:16 UTC
If someone wants to back out that change, they are welcome to, or just wait till I have time to look at it.  I can put back that change with the fix for this at the same time.

Even if I can't get my Linux install back up and running again, this particular fix I should be able to fix on my mac mini which I don't have with me.
Comment 3 Robert Wilkens 2012-07-25 23:08:48 UTC
I can now reproduce this problem, and i have a source tree built where i should be able to troubleshoot it easily.  Not sure when i'll find time to troubleshoot this, but hopefully it will be soon.
Comment 4 Robert Wilkens 2012-07-26 17:54:35 UTC
I have a fix, i believe, just need time to submit it.
Comment 5 Robert Wilkens 2012-07-26 18:16:23 UTC
Created attachment 2268 [details]
Suggested patch

This patch seems to help a lot with bug 6198.
Comment 6 Robert Wilkens 2012-07-27 14:21:59 UTC
The problem probably really lies most likely in the Check Combo function..  It's supposed to detect we've already been down a particular path and prevent us from trying the same path -- but as seen in the attached patch, which illustrates that we are finding duplicates and doing the same thing multiple times, it should be clear that as written it is not stopping us from duplicating something we already did.  It should be a relatively simple function *or method if you prefer, but as i noted in original commit, i wasn't 100% convinced it was bug free and apparently, and unfortunately, it seems that i was right.  I will try to find more time to troubleshoot this next week -- this week is kind of crazy for me.  The above patch is more of a workaround which at minimum solves this problem and at worst it doesn't make the original problem in 2663 worse as it at least solves it some of the time.
Comment 7 Robert Wilkens 2012-07-27 16:02:32 UTC
Created attachment 2269 [details]
Revised, not as fast, but more propper

This is a revision, and discards the previous patch.  This patch is not as fast as the previous patch, but it is faster than the original (without it).  

The original patch for 2663 had a minor glitch in CheckComboMatch where it would allow it to repeat itself.

I cannot confirm that this gives the results you want, but it at minimum it still solves the original and should be quicker than the original, unlike the previous patch on this bug report.
Comment 8 Robert Wilkens 2012-08-01 07:03:25 UTC
If someone else wants to give this a shot, it's welcome.  I"m not sure my fix is right, and i don't have the patience at this time to start looking into this any more than i have.
Comment 9 Miguel de Icaza [MSFT] 2012-08-15 16:34:59 UTC
Martin,

Could you review this patch?
Comment 10 Robert Wilkens 2012-08-17 22:50:54 UTC
I'll try to get back on this, in addition to whatever effort Martin is making.  If something has already been done, let me know.  Otherwise, i posted a message to the mailing list tonight(mono-devel) which describes an outline for what i need to do to fix this and i think it should work.  I am familiar enough with the code that if i can get it downloaded and compiled i can probably fix it quicker than someone else just getting familiar with the code.  I just need a day when i'll have time and the memory to do this.
Comment 11 Robert Wilkens 2012-08-18 07:51:24 UTC
I've got a fix which on my slow computer reduces runtime as follows

Before:

@ time mono SlowRegEx.exe 
Done

real	0m22.364s
user	0m22.341s
sys	0m0.024s

After:

@ time mono SlowRegEx.exe 
Done

real	0m13.330s
user	0m13.261s
sys	0m0.068s

So it almost cuts the runtime in half, which is better, though admittedly this may not be fast enough for what the user wants, but i think i'm going to try to create a pull request.  I just confirmed it passes the unit tests including the ones for 2663.
Comment 12 Robert Wilkens 2012-08-18 07:58:14 UTC
Ok, here's the pull request on the above if it's 'good enough' for now:

https://github.com/mono/mono/pull/432

If nothing else, this is a major improvement -- while still not breaking the problem reported in 2663 which was fixed earlier.
Comment 13 Rodrigo Kumpera 2013-01-09 15:35:04 UTC
The current approach is flawed as it changes the asymptotic runtime of the regex engine.