Bug 30538 - Wrong System.Drawing.Rectangle.GetHashCode() implementation
Summary: Wrong System.Drawing.Rectangle.GetHashCode() implementation
Status: RESOLVED ANSWERED
Alias: None
Product: Android
Classification: Xamarin
Component: BCL Class Libraries ()
Version: 5.1
Hardware: PC Windows
: Low enhancement
Target Milestone: ---
Assignee: Marek Habersack
URL:
Depends on:
Blocks:
 
Reported: 2015-05-28 07:40 UTC by Narcís Calvet
Modified: 2017-07-06 08:06 UTC (History)
4 users (show)

Tags: XATriaged
Is this bug a regression?: ---
Last known good build:


Attachments
2 screenshots showing the problem. (1002.09 KB, application/x-zip-compressed)
2015-05-29 07:12 UTC, Narcís Calvet
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on Developer Community or GitHub with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED ANSWERED

Description Narcís Calvet 2015-05-28 07:40:18 UTC
System.Drawing.Rectangle.GetHashCode() implmenentation is wrong. Running the code snippet below returns the same hash code for all rectangles. Running the same in a Windows Forms application returns a different result for each rectangle.

for (int i = 0; i < 100; i++)
{
	System.Drawing.Rectangle rect = new System.Drawing.Rectangle(i, i, i, i);
	int hash = rect.GetHashCode();

	System.Diagnostics.Debug.WriteLine(rect, "rect ");
	System.Diagnostics.Debug.WriteLine(hash, "hash");
}

Found the implementation in Xamarin.Android is this:

public override int GetHashCode()
{
    return ((this.height + this.width) ^ (this.x + this.y));
}

The same in the Mono Framework (https://github.com/mono/mono/blob/effa4c07ba850bedbe1ff54b2a5df281c058ebcb/mcs/class/System.Drawing/System.Drawing/Rectangle.cs):

public override int GetHashCode ()
{
	return (height + width) ^ x + y;
}

So this is both a Mono base classes and Xamarin.Android problem.
Comment 2 Jonathan Pryor 2015-05-28 11:40:21 UTC
Please do not copy and paste Microsoft proprietary code into bugzilla.

github.com/Microsoft/referencesource and related is fine. reference source.microsoft.com is NOT.
Comment 3 Jonathan Pryor 2015-05-28 11:52:34 UTC
You must not rely on the GetHashCode() implementation, as the implementation can change over time. Having different values from .NET is not a bug.

https://msdn.microsoft.com/en-us/library/system.object.gethashcode(v=vs.110).aspx

> the .NET Framework does not guarantee the default implementation of the GetHashCode
> method, and the value this method returns may differ between .NET Framework versions
> and platforms...

https://msdn.microsoft.com/en-us/library/system.string.gethashcode.aspx

> The hash code itself is not guaranteed to be stable. Hash codes for identical strings
> can differ across versions of the .NET Framework and across platforms...

The existing Rectangle.GetHashCode() implementation may not be ideal (increased risk of hash collision), but it is by no means "wrong" or buggy.

YOU MUST NOT RELY ON THE GetHashCode() IMPLEMENTATION.
Comment 4 chris 2015-05-28 11:56:10 UTC
We do not RELY on the GetHashCode implementation, we USE the GetHashCode implementation. Presumably if the GetHashCode was not fit for use, it would not be present in the API.

The Xamarin implementation is not fit for use. The .NET framework code IS fit for use. This is clearly problematic, and we request that the issue be reviewed as soon as possible.

Thank you.
Comment 5 Jonathan Pryor 2015-05-28 12:55:35 UTC
> The Xamarin implementation is not fit for use.

Please elaborate on how the current implementation is not fit for use, as I'm an idiot who cannot fathom how it's not fit for use.
Comment 6 Narcís Calvet 2015-05-28 13:50:11 UTC
Jonathan, my apologize if I posted code that could not be pasted here. I obtained it using Reflector on the System.Drawing.dll assembly.

As far as the use of GetHashCode() is concerned, we store object instances into a dictionary for performance purposes. Doing so we improve performance not having to create identical object instances. Each object instance is identified in the dictionary with a custom hash code (generated using GetHashCode() among other operations) calculated from each property that the instantiated object uses. As you suggested, current implementation has a high rate of hash collision, making our code not producing the expected output. Same code base works perfectly in Windows Forms applications but fails in Xamarin platforms. That's the reason why we think GetHashCode() is not fit for use. If I'm not wrong, the aim of GetHashCode() is providing the functionality I just described: https://msdn.microsoft.com/en-us/library/system.object.gethashcode%28v=vs.110%29.aspx

If you need more information don't hesitate to let me know.
Comment 7 chris 2015-05-28 14:18:16 UTC
From the MSDN documentation here:
https://msdn.microsoft.com/en-us/library/system.object.gethashcode%28v=vs.110%29.aspx

"A hash code is a numeric value that is used to insert and identify an object in a hash-based collection such as the Dictionary<TKey, TValue> class, the Hashtable class, or a type derived from the DictionaryBase class."

However, this is not possible with Xamarin's implementation of Rectangle.GetHashCode() because it generates identical hashcodes for different instance objects. Running the code the OP specifies, you will see that a good number of the instance objects, despite having different values for their properties, return identical hashcodes. Given that identical key values cannot be used in a Dictionary<TKey, TValue> instance, for example, this means that different instances of Rectangle with different values for their properties cannot be added to it. By definition, then, the implementation is unfit for this specified use.
Comment 8 Jonathan Pryor 2015-05-28 14:22:48 UTC
> we store object instances into a dictionary for performance purposes.
> Doing so we improve performance not having to create identical object instances.

I don't know what this means.

Presumably you're using a Dictionary<TKey, TValue>. What is TKey? Rectangle?

> current implementation has a high rate of hash
> collision, making our code not producing the expected output.

This likewise doesn't make sense: hash collision within Dictionary<TKey, TValue> would merely result in slower runtime behavior. It would NOT result in "code not producing the expected output".

For example:

  $ csharp -r:System.Drawing:
  csharp> using System.Drawing;
  csharp> new Rectangle (1, 1, 1, 1).GetHashCode();
  0
  csharp> new Rectangle (4, 4, 4, 4).GetHashCode(); 
  0

Note hash code collisions.

  csharp> var d = new Dictionary<Rectangle, string>();
  csharp> d[new Rectangle (1, 1, 1, 1)] = "1";      
  csharp> d[new Rectangle (4, 4, 4, 4)] = "4"; 

We add two entries to a Dictionary<Rectangle, string>, with the aforementioned Rectangle values which produce hash collisions.

  csharp> d[new Rectangle (1, 1, 1,1 )];
  "1"
  csharp> d[new Rectangle (4, 4, 4, 4)]; 
  "4"

Observe that the hash collisions DO NOT MATTER. We're still able to retrieve the correct value for a given key; we're not getting unexpected and incorrect output.

GetHashCode() producing collisions is *solely* a performance problem, and at that it's an indeterministic performance problem. (How big is this dictionary? Does profiling actually suggest that it's a problem?)
Comment 9 Jonathan Pryor 2015-05-28 14:35:52 UTC
> it generates identical hashcodes for different instance objects

There is no way to avoid this behavior. Avoiding collisions is IMPOSSIBLE.

object.GetHashCode() returns an Int32. It may thus contain, at most, 2**32 values.

Meanwhile, consider System.String. It can contain FAR MORE than 2**32 values. The same is true for any type that contains int fields. Consider:

   struct Point {
       public int x, y;
   }

How many possible Point values are there? There's a set of Point values for which x varies from int.MinValue to int.MaxValue and y is 0 -- 2**32 values, same as int -- and a separate set of Point values for which x is 0 and y varies from int.MinValue to int.MaxValue -- *another* 2**32 values, same as int --  then there's every other possible combination.

Note that the set of possible Point values far larger than what can be held in an Int32.

So what *must* happen? There *must* be the potential for collisions, and the Dictionary/etc. container needs to cope with them appropriately.

See also the Pigeonhole Principle: http://en.wikipedia.org/wiki/Pigeonhole_principle

In the case of Dictionary<TKey, TValue>, this is done by using the hash code to lookup an appropriate "bucket", then checking each key in the bucket with object.Equals() to see if it's an actual match.
Comment 10 chris 2015-05-28 14:55:57 UTC
Okay, you don't see it as a problem. It remains curious that, for identical client code, Microsoft's implementation of Rectangle.GetHashCode does not give any collision problems whereas Xamarin's does.

As you state, it is not an insurmountable problem to work around.
Comment 11 Marek Habersack 2015-05-28 15:32:15 UTC
@chris, may I point out the Remarks section here - https://msdn.microsoft.com/en-us/library/system.object.gethashcode%28v=vs.110%29.aspx - and especially the Caution box. Note the second bullet. This is what Jon is trying to explain. 
Also consider the paragraph above the Caution box as well as the two bullet points.
Consider another point as well - storing a value type in a dictionary will *not* increase performance. Each time you assign a value to the key and each time you extract the stored value you will have a copy performed. It will most probably hurt performance (and memory use) instead of improving it.
Neither Mono nor Microsoft implementations of GetHashCode are bad or wrong - they are different which is in line with the documentation. Can the Mono version be improved? Yes. Will it fix your problem? No. Will the Microsoft implementation (pasting code for which makes it harder to implement the new version of the method btw) always produce the desired results for you? No, there's no guarantee whatsoever for this.
Comment 12 chris 2015-05-28 15:41:19 UTC
@Marek, thanks.

One point though - hashing instance objects where such objects are repeatedly reused does have demonstrable performance benefits in our case. So far, the only trouble we have experienced are with the Xamarin implementations of the GetHashCode method. Pure coincidence? Maybe. We wanted to avoid bucketing given that the objects we hash exist within a small range of possible values, and I'm sure bucketing would have a negative impact on the performance of our hashing technique. We will try it and see, although I am still unconvinced by the explanations we have been given as to why the code @Narcís posted produces identical hash codes for Xamarin but not for Microsoft.
Comment 13 Marek Habersack 2015-05-28 15:51:03 UTC
@chris - there is no "reuse" with value types - you constantly copy them in their entirety. Instead of bucketing you can use your own implementation of GetHashCode () (a simple extension method will do the trick) even using the code you retrieved from Microsoft's System.Drawing.

As for why the codes are different? Because they're *implementation dependent* - it's how it is designed and it is perfectly fine for them to differ. It's precisely the same kind of situation as with the C 'char' type. It is up to compiler implementation to treat it as signed or unsigned - some compilers have one, some another. And it doesn't make neither of them bad or wrong. It's documented so it's up to the developer to make *proper* use of the type (if you want to be sure that your variable is signed char then you type it as 'signed char'). The same applies to GetHashCode () - what you're doing is against the documented behavior and is, let me be blunt, incorrect.
Comment 14 chris 2015-05-28 16:02:33 UTC
@Marek "reuse" here a simile to describe the process by which classes are instantiated as instances and these instances are "reused" rather than being instantiated again in the code. That these instances are actually copied rather than "reused"? Fine; from what I understand of .NET architecture, this copying is virtually instantaneous. And yes, our hashing process does use our own implementations of GetHashCode(), although we do use the Framework type GetHashCode() method as part of the seed in some cases.

I understand that the implementation is different, but when I said I was unconvinced by this explanation I meant that I was looking for an explanation that explains why the difference in implementation causes this effect. 

No, what we are doing is in accordance with the documentation. It is not incorrect. Can it be bettered? Yes, it can, and it will be.
Comment 15 Jonathan Pryor 2015-05-28 16:27:55 UTC
> we store object instances into a dictionary for performance purposes.
> Doing so we improve performance not having to create identical object instances.

I'm still wondering what this looks like at a code level. That could go a long way to explaining why the expected output isn't produced, unless the expected output *is* the value of Rectangle.GetHashCode()...
Comment 16 chris 2015-05-28 16:44:40 UTC
@Jonathan It is a very standard implementation of hashing following MSDN documentation. What was unexpected was the return value of the Xamarin Rectangle.GetHashValue which we were using as a seed value of a hashing function. This unexpectedness is perfectly reflected in the OP code, where the Xamarin Rectangle.GetHashValue returns identical values for different instance objects with different field values. As far as I can see, we have been given no coherent explanation for this behaviour other than the fact that the implementation is different, a fact the OP posted in his first comment which was subsequently deleted. Still, not to worry. I'm very confident we'll be able to make the necessary compensatory changes to our routines, thank you.
Comment 17 Narcís Calvet 2015-05-29 07:12:37 UTC
Created attachment 11386 [details]
2 screenshots showing the problem.

Here are two screenshots that show the implications Xamarin's Rectangle.GetHashCode implementation has. 

NoCache.png shows what we'd expect to get, a regular gradient across the entire area.

WrongHashCode.png shows what the result of GetHashCode, unpredictable gradients filling in the area. 

Such gradients are created using http://developer.android.com/reference/android/graphics/LinearGradient.html. Among other parameters, this is defined by a rectangle (x0, y0, x1 & y1). Rectangle's hash code is part of the custom hash code we calculate to store LinearGradient instances into a dictonary and not having to create unnecessary instances (Xamarin Profiler showed us that was more time consuming). Inconsistent hash codes render Xamarin implementation of Rectangle.GetHashCode useless for us. We have been able to circumvent that problem using Microsoft's implementation in our custom hash code calculation.
Comment 18 Jonathan Pryor 2015-05-29 11:36:44 UTC
> Rectangle's hash code is part of the custom hash code we calculate to store
> LinearGradient instances into a dictonary and not having to create unnecessary
> instances

What is your data structure? What is the key into your Dictionary?

As far as I can tell, what you're doing is similar to this:

    var d = new Dictionary<int, LinearGradient> ();
    var r = new Rectangle (...);
    d [r.GetHashCode()] = new LinearGradient (...);

i.e. using the result of Rectangle.GetHashCode() as the dictionary key.

Assuming the above is the case, *that's the bug*: The implementation of GetHashCode() can change at any time, for any reason. Hash codes do *not* uniquely identify a value; they *can't* (see Comment #9).

The screenshots you provided show a problem. They do not indicate that the problem lies solely in the GetHashCode() implementation.
Comment 19 Narcís Calvet 2015-05-29 12:20:05 UTC
Yes, we use custom hash codes as dictionary keys as you mentioned. However, those dictionaries are cleaned and recreated at each execution, they are only valid at run-time and not stored at any data structure. Therefore, any change in the implementation of GetHashCode() will be reflected accordingly into all object instances stored in the dictionary. This works flawlessly in Windows Forms applications. We encountered problems with hash code calculated in Xamarin/Mono platform.