Bug 24969 - Identifiers should allow formatting characters (Unicode category Cf)
Summary: Identifiers should allow formatting characters (Unicode category Cf)
Status: RESOLVED FIXED
Alias: None
Product: Compilers
Classification: Mono
Component: C# ()
Version: unspecified
Hardware: PC Windows
: --- normal
Target Milestone: ---
Assignee: Marek Safar
URL:
Depends on:
Blocks:
 
Reported: 2014-12-01 17:36 UTC by Jon Skeet
Modified: 2015-05-25 09:03 UTC (History)
1 user (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Jon Skeet 2014-12-01 17:36:09 UTC
This is similar to bug 18229, but for another Unicode category (Cf).
This is hard to demonstrate in ASCII due to bug 24969.

Consider code

    string x?y = "";

where the ? is replaced with U+070F, the Syriac Abbreviation Mark. That's in the Unicode "Other, Formatting" category Cf, and should be allowed in an identifier as per section 2.4.2 of the MS C# 5 specification. mcs disallows this. This code compiles with csc and Roslyn, but not mcs.

The error is in cs-tokenizer.cs:

https://github.com/mono/mono/blob/effa4c07ba850bedbe1ff54b2a5df281c058ebcb/mcs/mcs/cs-tokenizer.cs#L940

This handles Unicode categories Mn, Mc, Nd and Pc, but not Cf.
Note that Cf is "special" in that Cf characters are *not* compared when checking whether two identifiers are the same - so I suspect it's reasonable to skip them when parsing... but it has to be *permitted* while parsing.
Comment 1 Marek Safar 2015-05-25 09:03:45 UTC
Fixed in master