Bug 24268 - GREFs leak on Android Lollipop
Summary: GREFs leak on Android Lollipop
Status: VERIFIED FIXED
Alias: None
Product: Android
Classification: Xamarin
Component: Mono runtime / AOT Compiler ()
Version: 4.18.0
Hardware: PC Windows
: --- normal
Target Milestone: 4.20
Assignee: Mark Probst
URL: https://bitbucket.org/tpurtell/gref-l...
Depends on:
Blocks:
 
Reported: 2014-11-04 22:21 UTC by T.J. Purtell
Modified: 2014-11-11 20:25 UTC (History)
4 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
adb logcat output with debug.mono.log gref,gc (7.42 MB, application/x-zip-compressed)
2014-11-04 23:11 UTC, T.J. Purtell
Details
logcat for the process on ART KitKat (105.90 KB, text/plain)
2014-11-07 15:20 UTC, T.J. Purtell
Details
logcat for the process on Lollipop (27.30 KB, text/plain)
2014-11-07 15:21 UTC, T.J. Purtell
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on Developer Community or GitHub with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
VERIFIED FIXED

Description T.J. Purtell 2014-11-04 22:21:56 UTC
Doesn't happen on Android 4.4.2 with ART (Galaxy Nexus CyanogenMode), but does happen on a Nexus 5 Lollipop, latest release image.

Microsoft Visual Studio Ultimate 2013
Version 12.0.30501.00 Update 2
Microsoft .NET Framework
Version 4.5.51641

Xamarin   3.8.48.0 (ce228b21c69ea3a0feaa7b5e50e9bb19fcbe238c)
Visual Studio extension to enable development for Xamarin.iOS and Xamarin.Android.

Xamarin.Android   4.18.0.39 (0ef616ae42005ece52f6df7aa1e093ceb11998f8)
Visual Studio plugin to enable development for Xamarin.Android.
Comment 1 T.J. Purtell 2014-11-04 22:24:13 UTC
This test case shows the GREF count on screen while allocating 1000 objects between repeated garbage collection.  The count rapidly increases and then the app stalls out and jams.

There is a good chance this is related to Bug 23959.
Comment 2 Jonathan Pryor 2014-11-04 22:55:02 UTC
>         [DllImport("__Internal", EntryPoint = "_monodroid_gref_get")]
>         public static extern int GetGrefs();

Really? You could just use Runtime.GlobalReferenceCount:

http://androidapi.xamarin.com/?link=P%3aJava.Interop.Runtime.GlobalReferenceCount

The deal with Bug #23959 is that it's an emulator, so we assume that there's a 2000 GREF limit, so once we get to ~1800 GREFs we perform a GC. ART in API-21 apparently doesn't have this limit (I need to confirm), so Bug #23959 sees 6000+ GREFs, so it's *constantly* doing a GC.

What I don't know is why there are so many GREFs in the first place. I'd have to run the sample for Bug #23959.

This bug, however, is on hardware, not the emulator, so assuming we properly detect it as hardware then the GREF limit will be ~41000, so unless you're exceeding that then the cause of the slowdown will be different. (Getting 41000 GREFs would ITSELF cause performance issues; don't do that.)

Unfortunately, I don't have API-21 hardware at this time.

Could you please enable GREF logging, run your test, and attach the logcat output?

    $ adb shell setprop debug.mono.log gref
    $ adb logcat > logcat.txt
    # run your app...
    [Ctrl+C]
    # attach logcat.txt
Comment 3 T.J. Purtell 2014-11-04 23:10:49 UTC
Oh that's a handy method, I didn't know about.  Thanks.
Comment 4 T.J. Purtell 2014-11-04 23:11:56 UTC
Created attachment 8625 [details]
adb logcat output with debug.mono.log gref,gc
Comment 5 T.J. Purtell 2014-11-04 23:13:25 UTC
FYI, the Dalvik in KitKat removed the GREF limit, while ART set a 64k limit.  I haven't seen what the sources of lollipop actually do for the GREF limit.
Comment 6 Jonathan Pryor 2014-11-05 12:56:58 UTC
You have successfully created logcat output which chokes logcat-parse.exe. Congratulations!

That said, the very end of the file is interesting:

> I/monodroid-gref( 2789): -g- grefc 48 gwrefc 5489 handle 0xb500496/G from thread 'finalizer'(2857)^M

At the time that message was printed, there were 48 GREFs (yay), and 5489 weak global references. (Doh!)

Please try one of the alternate GC bridge implementations. Do they improve performance for you?

http://blog.xamarin.com/xamarin.android-garbage-collection-improvements/
Comment 7 Jonathan Pryor 2014-11-05 12:57:32 UTC
@Mark: Here's a GC bridge perf example for you. ;-)
Comment 8 T.J. Purtell 2014-11-05 13:10:49 UTC
The problem isn't the performance per se, if I run the sample app for a minute the GREF count climbs to 40k+ because references are leaking.  As you mentioned, it does look like these are somehow weak references.  At that point, the app is basically completely hung.

The object graph here is very simple.  The only allocated objects are single Java.Lang.Object with no out or in references.  As you mentioned, it does look like the there is some problem in the GC bridge because the logs in just GC log mode (vs GC+GREF) log mode I noticed that the messages were bouncing back and forth between showing a high weak global ref count and a high global ref count.

I can try the other GC bridges for you to see if one has a bug that causes the leak, but I think its unlikely as the same sample worked fine on an Android 4.4.2 with ART enabled.

I remember from the release notes that you mentioned you had to work around an ART issue with global refs to null being legal and that you conditioned that behavior based on the Android version...
Comment 9 T.J. Purtell 2014-11-05 13:23:08 UTC
I added the android environment file and tried all 3 bridges which have the same result.  # of grefs climbs, test app slows down to a crawl within a minute.

If I let it run for maybe 2 minutes I see the count gets very high

> I/monodroid-gc(14155): GC cleanup summary: 44770 objects tested - resurrecting 4
4770.

And the monodroid warnings about gref count getting high have kicked in.  

> I/monodroid-gc(14155): 46810 outstanding GREFs. Performing a full GC!

I presume eventually it will crash.
Comment 10 T.J. Purtell 2014-11-05 13:54:10 UTC
So you can see the leak in action...

https://www.dropbox.com/s/50x3opasoagnw33/Video%20Nov%2005%2C%2010%2045%2031%20AM.mov?dl=0

Lollipop on the left, KitKat ART on the right
Comment 11 T.J. Purtell 2014-11-06 21:35:33 UTC
If I connect the android monitor tool to the test app for heap monitoring, the heap statistics are supposed to update whenever a Java GC happens.  But they never update.  If I click the "force update heap" button, then the gref count drops.

I turned on verbose GC logging (I think) with this command

adb shell setprop art.vm.extra-opts "-verbose:gc"

And it looks like mono GC is not triggering a Java GC for the magic weak/strong reference for each GC op.  It worked for the first one... but this would explain why the grefs grow until a point of DOOOM.

> I/ActivityManager(  750): Displayed App1.App1/app1.MainActivity: +1s393ms
> I/wpa_supplicant(19754): wlan0: CTRL-EVENT-SCAN-STARTED 
> I/mono-stdout( 3869): Will gc 1, 101, 101

> I/art     ( 3869): Explicit concurrent mark sweep GC freed 5320(261KB) AllocSpace objects, 2(93KB) LOS objects, 40% free, 15MB/25MB, paused 1.358ms total 39.723ms
> I/monodroid-gc( 3869): GC cleanup summary: 102 objects tested - resurrecting 1.
> D/Mono    ( 3869): GC_OLD_BRIDGE num-objects 102 num_hash_entries 103 sccs size 103 init 0.00ms df1 0.96ms sort 0.05ms dfs2 2.25ms setup-cb 0.16ms free-data 0.21ms links 1/1/1/1 dfs passes 206/104
> D/Mono    ( 3869): GC_MAJOR: (user request) pause 12.55ms, total 13.98ms, bridge 51.63ms major 720K/240K los 0K/0K
> D/Mono    ( 3869): DllImport searching in: '__Internal' ('(null)').
> D/Mono    ( 3869): Searching for '_monodroid_gref_get'.
> D/Mono    ( 3869): Probing '_monodroid_gref_get'.
> D/Mono    ( 3869): Found as '_monodroid_gref_get'.

> I/mono-stdout( 3869): Will gc 2, 104, 205
> I/monodroid-gc( 3869): GC cleanup summary: 101 objects tested - resurrecting 101.
> D/Mono    ( 3869): GC_OLD_BRIDGE num-objects 101 num_hash_entries 102 sccs size 102 init 0.00ms df1 0.29ms sort 0.01ms dfs2 0.39ms setup-cb 0.04ms free-data 0.04ms links 1/1/1/1 dfs passes 204/103
> D/Mono    ( 3869): GC_MAJOR: (user request) pause 3.62ms, total 4.34ms, bridge 1.46ms major 720K/720K los 0K/0K

> I/mono-stdout( 3869): Will gc 3, 104, 309
> I/monodroid-gc( 3869): GC cleanup summary: 205 objects tested - resurrecting 205.
> D/Mono    ( 3869): GC_OLD_BRIDGE num-objects 205 num_hash_entries 206 sccs size 206 init 0.00ms df1 0.38ms sort 0.02ms dfs2 0.69ms setup-cb 0.07ms free-data 0.07ms links 1/1/1/1 dfs passes 412/207
> D/Mono    ( 3869): GC_MAJOR: (user request) pause 9.07ms, total 12.10ms, bridge 4.36ms major 704K/720K los 0K/0K

> I/mono-stdout( 3869): Will gc 4, 104, 413
> I/monodroid-gc( 3869): GC cleanup summary: 309 objects tested - resurrecting 309.
> D/Mono    ( 3869): GC_OLD_BRIDGE num-objects 309 num_hash_entries 310 sccs size 310 init 0.00ms df1 0.45ms sort 0.06ms dfs2 0.55ms setup-cb 0.11ms free-data 0.08ms links 1/1/1/1 dfs passes 620/311
> D/Mono    ( 3869): GC_MAJOR: (user request) pause 3.75ms, total 4.03ms, bridge 13.19ms major 704K/704K los 0K/0K

> I/mono-stdout( 3869): Will gc 5, 50, 463
> D/Mono    ( 3869): GC_BRIDGE waiting for bridge processing to finish
> I/monodroid-gc( 3869): GC cleanup summary: 368 objects tested - resurrecting 368.
> D/Mono    ( 3869): GC_OLD_BRIDGE num-objects 368 num_hash_entries 369 sccs size 369 init 0.00ms df1 0.85ms sort 0.10ms dfs2 0.89ms setup-cb 0.22ms free-data 0.17ms links 1/1/1/1 dfs passes 738/370
> D/Mono    ( 3869): GC_MAJOR: (user request) pause 4.50ms, total 4.95ms, bridge 7.38ms major 720K/704K los 0K/0K
Comment 12 T.J. Purtell 2014-11-06 21:54:06 UTC
Sometimes doing the force heap updates causes a deadlock.. so that might be interesting as well...
Comment 13 T.J. Purtell 2014-11-07 13:01:36 UTC
Any thoughts on this?  This is such a simple test case (just allocates java objects that should immediately be eligible for collection) that its a near certainty the underlying issue will affect all Xamarin.Android apps running on Lollipop.  It doesn't depend on which API level is targeted, and happens with the stable/beta releases, which means that when the Lollipop updates roll out next week sadness will come to pass.  The garbage men will slow to a crawl, and the street will fill with stinky performance. :(
Comment 14 Mark Probst 2014-11-07 13:03:28 UTC
Is it possible that Lollipop doesn't do a GC when we ask it to?
Comment 15 T.J. Purtell 2014-11-07 14:30:30 UTC
Well maybe if someone has a pending GetPrimitiveArrayCritical (etc) there will be a problem?  The compacting collector is disabled.

https://android.googlesource.com/platform/art/+/android-5.0.0_r2/runtime/gc/heap.cc @  1556

Seems like Mono does some of these, so maybe some are unmatched or something like that... its weird that the debugger triggered GC was able to work.  That may mean mono is holding the critical references during the collection process.
Comment 16 T.J. Purtell 2014-11-07 14:32:27 UTC
1635 says this...

>       // GC can be disabled if someone has a used GetPrimitiveArrayCritical but not yet released.
Comment 17 T.J. Purtell 2014-11-07 14:48:43 UTC
Actually, I don't know that mono calls these for sure, I just grepped and found it in Mono.Android.dll, but it seems like a working theory...
Comment 18 T.J. Purtell 2014-11-07 15:20:51 UTC
Created attachment 8661 [details]
logcat for the process on ART KitKat
Comment 19 T.J. Purtell 2014-11-07 15:21:19 UTC
Created attachment 8662 [details]
logcat for the process on Lollipop
Comment 20 T.J. Purtell 2014-11-07 15:46:39 UTC
Lollipop does this, not sure if it is relevant
> 11-07 15:19:19.876: I/art(24118): Late-enabling -Xcheck:jni

KitKat ART does this...
> 11-07 12:12:01.205: W/monodroid-gc(4591): JNI weak global refs are broken on Android with the ART runtime.
> 11-07 12:12:01.205: W/monodroid-gc(4591): Trying to use java.lang.WeakReference instead, but this may fail as well.
> 11-07 12:12:01.205: W/monodroid-gc(4591): App stability may be compromised.
> 11-07 12:12:01.205: W/monodroid-gc(4591): See: https://code.google.com/p/android/issues/detail?id=63929

While Lollipop does this...
11-07 15:19:20.000: I/monodroid-gc(24118): environment supports jni NewWeakGlobalRef

Otherwise, I don't reliably see any logs from ART on either OS version...  yet the KitKat ART keeps the GREFs low.

On KitKat ART, if I attached the java heap tracing, the stats update automatically (they update when a GC occurs).
On Lollipop, if I attached the java heap tracing, the stats never update (presumably this means a GC isn't called).

...

I assume the reason the two behave dramatically differently is that they are using two different weak reference techniques.  Maybe some other new Lollipop GC features/implementation details mean the NewWeakGlobalRef technique doesn't work???
Comment 21 T.J. Purtell 2014-11-07 19:13:39 UTC
OK, got a winner here I think.  This is so ludicrous...

JavaSystem.Gc() => No GC
Java.Lang.Runtime.GetRuntime().Gc() => Performs an explicit GC.

That seems utterly insane to me, but it shows in the log with the second form, but not the first.
Comment 22 T.J. Purtell 2014-11-07 19:25:01 UTC
https://android.googlesource.com/platform/libcore/+/android-5.0.0_r2/luni/src/main/java/java/lang/System.java#642

Apparently this is so sort of effort to avoid running a GC if there was recently some finalization...

Changed v KitKat

https://android.googlesource.com/platform/libcore/+/android-4.4.2_r2/luni/src/main/java/java/lang/System.java#199
Comment 23 Craig 2014-11-10 15:10:16 UTC
Xamarin can you please provide an update?  Thanks.
Comment 24 Jonathan Pryor 2014-11-11 15:58:19 UTC
T.J.: THANK YOU THANK YOU THANK YOU.

Fixed in monodroid/42fb5811 and monodroid-4.20-series/812ca135.

This should be included in the forthcoming Xamarin.Android 4.20 beta.
Comment 25 Peter Collins 2014-11-11 19:37:20 UTC
This fix has been released to the Beta channel.
Comment 26 T.J. Purtell 2014-11-11 20:25:13 UTC
Sweet! Thanks.