Bug 25443 - [Lollipop] Instability in Android app on older devices which have been upgraded to Lollipop, possibly related to .Handle is IntPtr.Zero before being disposed.
Summary: [Lollipop] Instability in Android app on older devices which have been upgrad...
Status: RESOLVED FIXED
Alias: None
Product: Android
Classification: Xamarin
Component: General ()
Version: 4.20.0
Hardware: PC Mac OS
: Normal normal
Target Milestone: ---
Assignee: Jonathan Pryor
URL:
Depends on:
Blocks:
 
Reported: 2014-12-16 20:35 UTC by Kent Green [MSFT]
Modified: 2017-05-28 02:32 UTC (History)
6 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
crash & IDE log files from customer (163.19 KB, application/zip)
2014-12-16 20:35 UTC, Kent Green [MSFT]
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on Developer Community or GitHub with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Kent Green [MSFT] 2014-12-16 20:35:13 UTC
Created attachment 9094 [details]
crash & IDE log files from customer

---Overview---
From this desk case:
https://xamarin.desk.com/agent/case/111662

Customer reports that they are seeing instability in their Android app on older devices which have been upgraded to Lollipop, possibly related to .Handle is IntPtr.Zero before being disposed. They also included IDE logs from Visual Studio & crash logs from the devices that had crashed.

---Build Information---
Microsoft Visual Studio 2010
Version 10.0.40219.1 SP1Rel
Microsoft .NET Framework
Version 4.5.50938 SP1Rel

Installed Version: Ultimate

Microsoft Office Developer Tools 01019-532-2002102-70099
Microsoft Office Developer Tools

Microsoft Visual Basic 2010 01019-532-2002102-70099
Microsoft Visual Basic 2010

Microsoft Visual C# 2010 01019-532-2002102-70099
Microsoft Visual C# 2010

Microsoft Visual C++ 2010 01019-532-2002102-70099
Microsoft Visual C++ 2010

Microsoft Visual F# 2010 01019-532-2002102-70099
Microsoft Visual F# 2010

Microsoft Visual Studio 2010 Architecture and Modeling Tools 01019-532-2002102-70099
Microsoft Visual Studio 2010 Architecture and Modeling Tools

UML® and Unified Modeling Language™ are trademarks or registered trademarks of the Object Management Group, Inc. in the United States and other countries.

Microsoft Visual Studio 2010 Code Analysis Spell Checker 01019-532-2002102-70099
Microsoft Visual Studio 2010 Code Analysis Spell Checker

Portions of International CorrectSpell™ spelling correction system © 1993 by Lernout & Hauspie Speech Products N.V. All rights reserved.

The American Heritage® Dictionary of the English Language, Third Edition Copyright © 1992 Houghton Mifflin Company. Electronic version licensed from Lernout & Hauspie Speech Products N.V. All rights reserved.

Microsoft Visual Studio 2010 Team Explorer 01019-532-2002102-70099
Microsoft Visual Studio 2010 Team Explorer

Microsoft Visual Web Developer 2010 01019-532-2002102-70099
Microsoft Visual Web Developer 2010

Critical Update for Microsoft Visual Studio 2010 Ultimate - ENU (KB2938807) KB2938807
This critical update is for Microsoft Visual Studio 2010 Ultimate - ENU.
If you later install a more recent service pack, this critical update will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2938807.

Crystal Reports Templates for Microsoft Visual Studio 2010
Crystal Reports Templates for Microsoft Visual Studio 2010

Hotfix for Microsoft Visual Studio 2010 Ultimate - ENU (KB2529927) KB2529927
This hotfix is for Microsoft Visual Studio 2010 Ultimate - ENU.
If you later install a more recent service pack, this hotfix will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2529927.

Hotfix for Microsoft Visual Studio 2010 Ultimate - ENU (KB2542054) KB2542054
This hotfix is for Microsoft Visual Studio 2010 Ultimate - ENU.
If you later install a more recent service pack, this hotfix will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2542054.

Hotfix for Microsoft Visual Studio 2010 Ultimate - ENU (KB2548139) KB2548139
This hotfix is for Microsoft Visual Studio 2010 Ultimate - ENU.
If you later install a more recent service pack, this hotfix will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2548139.

Hotfix for Microsoft Visual Studio 2010 Ultimate - ENU (KB2549864) KB2549864
This hotfix is for Microsoft Visual Studio 2010 Ultimate - ENU.
If you later install a more recent service pack, this hotfix will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2549864.

Hotfix for Microsoft Visual Studio 2010 Ultimate - ENU (KB2635973) KB2635973
This hotfix is for Microsoft Visual Studio 2010 Ultimate - ENU.
If you later install a more recent service pack, this hotfix will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2635973.

Hotfix for Microsoft Visual Studio 2010 Ultimate - ENU (KB2890573) KB2890573
This hotfix is for Microsoft Visual Studio 2010 Ultimate - ENU.
If you later install a more recent service pack, this hotfix will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2890573.

Hotfix for Microsoft Visual Studio 2010 Ultimate - ENU (KB3002340) KB3002340
This hotfix is for Microsoft Visual Studio 2010 Ultimate - ENU.
If you later install a more recent service pack, this hotfix will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/3002340.

Microsoft Visual Studio 2010 SharePoint Developer Tools 10.0.40219
Microsoft Visual Studio 2010 SharePoint Developer Tools

Microsoft Visual Studio 2010 Ultimate - ENU Service Pack 1 (KB983509) KB983509
This service pack is for Microsoft Visual Studio 2010 Ultimate - ENU.
If you later install a more recent service pack, this service pack will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/983509.

NuGet Package Manager 2.2.31210.9045
NuGet Package Manager in Visual Studio. For more information about NuGet, visit http://docs.nuget.org/.

Security Update for Microsoft Visual Studio 2010 Ultimate - ENU (KB2644980) KB2644980
This security update is for Microsoft Visual Studio 2010 Ultimate - ENU.
If you later install a more recent service pack, this security update will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2644980.

Security Update for Microsoft Visual Studio 2010 Ultimate - ENU (KB2645410) KB2645410
This security update is for Microsoft Visual Studio 2010 Ultimate - ENU.
If you later install a more recent service pack, this security update will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/2645410.

Xamarin 3.8.150.0 (10cfd178d55287f09c85f5a1e604dfe20889a40f)
Visual Studio extension to enable development for Xamarin.iOS and Xamarin.Android.

Xamarin.Android 4.20.0.28 (ba9bbbdd44cfdc4bf485e8885bd2ad24fba525f7)
Visual Studio plugin to enable development for Xamarin.Android.

Xamarin.iOS 8.4.0.0 (840a925103a0bf4a856507f13d5eaee3c1579c2f)
Visual Studio extension to enable development for Xamarin.iOS.
Comment 1 Jonathan Pryor 2014-12-17 15:19:16 UTC
Unfortunately, we need more information, like a complete repro, which doesn't exist. :-(

What devices is this observed on?

> 12-16 14:44:39.762: E/art(1962): JNI ERROR (app bug): accessed deleted global reference 0xc4013b2

That's not a good sign; it implies something broke somewhere.

> 12-16 08:44:23.658: I/MonoDroid(31916): UNHANDLED EXCEPTION:
> 12-16 08:44:23.675: I/MonoDroid(31916): System.InvalidOperationException: Trying to dispose handle of type 'Invalid' which is not supported.
> 12-16 08:44:23.675: I/MonoDroid(31916):   at Java.Lang.Object.Dispose (System.Object instance, System.IntPtr& handle, IntPtr key_handle, JObjectRefType handle_type) [0x00000] in <filename unknown>:0 
> 12-16 08:44:23.675: I/MonoDroid(31916):   at Java.Lang.Object.Dispose () [0x00000] in <filename unknown>:0 

That's *really* not a good sign; I have no idea how Object.handle_type is being set to JObjectRefType.Invalid (0).

We need to be able to reproduce this locally. :-(
Comment 2 Tim Dawson 2014-12-17 16:09:05 UTC
Hello Jon, the good news is this should be fairly easy for you to reproduce. Just download our app. It's called SkyDemon and it's in the Play Store. Run it, create a free trial and log in, download the UK chart then press Simulate to start a simulated flight. After a while, it will crash.

This seems to happen on all sorts of devices but anecdotally, I think it's not-current-generation devices running Lollipop.
Comment 3 Jonathan Pryor 2014-12-18 16:04:20 UTC
@Tim: How long does it normally take to see the crash? I'm running it on a Nexus 5 for the past hour, and it hasn't crashed yet.

I've also enabled GREF logging, and it's currently 6252 GREFs...
Comment 4 Jonathan Pryor 2014-12-18 17:16:25 UTC
2+ hours in, the app hasn't crashed yet, there are currently 15384 GREFs, and the app isn't working quite right.

Specifically, the screen has turned off, and when I hit the Power button then unlock the screen, the screen promptly locks itself again.

If I start the Phone app, I'm able to return to the Home screen, and then switch back to SkyDemon through the multitasking button. (Now at 16237 GREFs.)
Comment 5 Tim Dawson 2014-12-18 17:33:02 UTC
Judging from the customer feedback I'd expect a crash after an hour or two.

Having just reviewed the reports so far, it seems the Nexus 7 (2013) is the most prolific offender running Android 5.0.1. I hope you see the crash on your Nexus 5 but if not, and you have a Nexus 7 handy, it might be useful to try that for attempt two. Also, just to make absolutely sure, install the latest beta from the link below. It actually crashes less than the latest public version (the one I pointed to you) but the latest beta is the one which produced the logs in my original report.

http://www.skydemon.aero/start/android.aspx

We've had some reports of the screen-sleep-prevention not working under Lollipop but haven't followed those up yet; the instability is much more of a concern. We just set that property on View which is supposed to prevent the screen turning off, and that has always worked for us.

Do you now have decent tooling available for tracking potential objects that are being kept around? Your messages seem to indicate you think the number of GREFs is unusual (and climbing).

Tim
Comment 6 Jonathan Pryor 2014-12-19 12:06:18 UTC
> Judging from the customer feedback I'd expect a crash after an hour or two.

I let it run overnight. It didn't crash, last least not as described in Comment #1.

It did *exit*:

> I/ActivityManager(  749): Killing 11718:aero.skydemon.skydemonandroid/u0a910 (adj 15): empty for 2117s

I have no idea what that means. :-(

I also ran this on Android 5.0, not 5.0.1. I'll need to upgrade and try again.

> Do you now have decent tooling available for tracking potential objects that
> are being kept around?

The no-longer-available Xamarin.Android 4.99 alpha contained a tool named logcat-parse which can help with this. I described it at Evolve:

https://www.youtube.com/watch?v=Gpqc2XZIQfU#t=1870

I threw logs from ~1-2 hours of GREF execution at it consisting of 1.1G of data; it took an eternity to load, and is now consuming 1.78GB of RAM, and is probably wrong:

> csharp> grefs.AlivePeers.Count();
> 531754

There should be 18k, not 53k... :-/

Those "53k" GREFs are from 44 types:

> csharp> grefs.GetAliveJniTypeCounts().Count();
> 44

> csharp> foreach (var info in grefs.GetAliveJniTypeCounts().OrderByDescending (e => e.Value)) Console.WriteLine ("{0}: {1}", info.Value, info.Key);
> 531353: 
> 128: android/graphics/Path
> 120: android/graphics/Rect
> ...

...at which point this becomes useless, because most of the types have no type information.

#facepalm

Looks like I have more bugs to fix... :-(
Comment 7 Tim Dawson 2014-12-19 12:59:19 UTC
Thanks, I appreciate your investigations so far. I have no idea why Path/Rect instances could be hanging around so long, though we do cache a few for performance reasons. Those could potentially build up if you never zoomed in or out but just flew on forever, but I can't see how you'd end up with THAT many.

If you do have access to a Nexus 7 (older model) with 5.0.1 on it and can install the SkyDemon beta I pointed to you before, that does sound like the most sure-fire way of getting this to happen.

I imagine these sort of bugs are a nightmare to fix, but once fixed are an instant stability win for everybody.

Cheers

Tim
Comment 8 Jonathan Pryor 2014-12-19 14:55:35 UTC
@Tim: Good news: The beta on my Nexus 5 with Android 5.0.1 crashed!

> F/art     ( 6004): art/runtime/check_jni.cc:65] JNI DETECTED ERROR IN APPLICATION: java_object == null
> F/art     ( 6004): art/runtime/check_jni.cc:65]     in call to GetObjectRefType

We're somehow calling JNIEnv::GetObjectRefType() with a null value, and ART is aborting. Unfortunately, that's an ART bug; NULL is *explicitly° called out as a "valid" value in the docs!

http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetObjectRefType

I've filed this with Google as: https://code.google.com/p/android/issues/detail?id=83194

In the meantime, a question arises: why/how are we passing NULL to JNIEnv::GetObjectRefType()?

(Un)fortunately, that's easier to answer: we invoke JNIEnv::GetObjectRefType() as part of GREF logging (oof) to determine the type of JNI handle that was returned from a previous JNI call, and NULL is a perfectly valid value for other JNI calls to return...

The Bad news (as if the above weren't bad enough?) is that the crash I'm seeing is *not* the error reported in Comment #1, meaning while I'm seeing a crash I'm not seeing the *right* crash. :-(
Comment 9 Tim Dawson 2014-12-23 05:58:38 UTC
Thanks Jon. What's the next step here? This is obviously not only something that's beyond my expertise but something over which I have little if any control. Any suggestions for us, any debugging code I can add to help you, or do we need to leave this entirely to you?

Tim
Comment 10 Tim Dawson 2015-01-06 06:05:44 UTC
Could we have a status update on this please? This is affecting a larger and larger cross-section of our customers who are understandably upset that the app is crashing at random times during flight.

Tim
Comment 11 Tim Dawson 2015-01-13 07:19:39 UTC
Another week has slipped by without input from you. Can we have a status update please?

Tim
Comment 12 Tim Dawson 2015-01-28 05:41:47 UTC
You should be ashamed of yourselves. This is a massive issue well beyond our control, significantly affecting the stability of our app and no doubt many others. And you can't even be bothered to keep us posted on progress.

Tim
Comment 13 Tim Dawson 2015-01-28 05:46:45 UTC
For clarify, we are seeing more and more customers getting the app aborting with the following dumped to logcat:

JNI ERROR (app bug): accessed deleted global reference 0x600d82

Now. You need to help us with this, because it's clearly a platform issue. I can provide you with debug builds of the app if you like, with stuff to help you trace it. But we won't get anywhere if you keep ignoring it.

Tim
Comment 14 Miguel de Icaza [MSFT] 2015-01-29 16:02:38 UTC
Hello Tim,

It would be very helpful to have the source code.

We can arrange for an NDA to be put in place.
Comment 15 Tim Dawson 2015-01-30 07:44:13 UTC
I agree it would, but as in previous occasions, that just isn't possible I'm afraid.

As I have done in the past in this situation, I would like to offer to fly myself, at my expense, to any city in the world to meet with a Xamarin engineer (with my source code) and mutually debug the issue. Unfortunately my partner is ill at the moment and I can't leave the country, so perhaps we could fly a Xamarin engineer to the UK instead.

Or perhaps we make a deal that whoever the bug turns out to belong to pays.

Alternatively, I can attempt to help your engineers reproduce the problem with some guidance. I'm sure someone would succeed with a couple of hours trying on a suite of appropriate devices with Lollipop. The issue is critical because a LOT of people are experiencing these crashes (while flying their aircraft) and for hopefully obvious reasons that is unacceptable. It's the weeks without any of your attention whatsoever to this obviously critical bug which is completely unacceptable.

Please; get in touch. You know my email address, I am responsive and I want to help you fix this. We will not be the only complex and memory-intensive app which is suffering here.

Tim
Comment 16 Tim Dawson 2015-02-02 10:05:11 UTC
Our own development Nexus 7 has now been upgraded to Lollipop and we are now experiencing massive instability ourselves. This is a good thing (kind of) because previously we were relying on customer reports.

So, I can insert whatever debugging information you may deem appropriate.

I also reiterate that it should be quite easy for you to reproduce this on any older Nexus device which has been upgraded. This one is running 5.0.2. So far I have tried only debug builds, though further up this thread you can see the instructions for reproducing the crash on release (beta) builds.

The output I get in DDMS is confusing; I haven't seen anything like it. Here's some below. At the beginning of this short excerpt everything is fine, but then the process terminates without apparent explanation.

02-02 14:58:18.060: I/mono-stdout(23071): whole render: 884
02-02 14:58:18.778: I/mono-stdout(23071): whole render: 380
02-02 14:58:19.700: E/WifiStateMachine(527): WifiStateMachine CMD_START_SCAN source -2 txSuccessRate=6.69 rxSuccessRate=10.44 targetRoamBSSID=00:50:7f:5e:ff:88 RSSI=-55
02-02 14:58:19.708: I/wpa_supplicant(1660): wlan0: CTRL-EVENT-SCAN-STARTED 
02-02 14:58:19.867: W/SoundPool(23071):   sample 0 not READY
02-02 14:58:19.892: W/art(23071): JNI RegisterNativeMethods: attempt to register 0 native methods for divelements.skydemon.mapping.NavigationVirtualRadar
02-02 14:58:20.382: W/InputMethodManagerService(527): Window already focused, ignoring focus gain of: com.android.internal.view.IInputMethodClient$Stub$Proxy@2a7db242 attribute=android.view.inputmethod.EditorInfo@4e13253, token = android.os.BinderProxy@502660
02-02 14:58:20.636: I/mono-stdout(23071): whole render: 458
02-02 14:58:21.019: I/WindowState(527): WIN DEATH: Window{1c11c719 u0 aero.skydemon.skydemonandroid/divelements.skydemon.Activity1}
02-02 14:58:21.136: I/Zygote(142): Process 23071 exited due to signal (11)
02-02 14:58:21.305: I/ActivityManager(527): Process aero.skydemon.skydemonandroid (pid 23071) has died
02-02 14:58:21.306: W/ActivityManager(527): Force removing ActivityRecord{12a14a06 u0 aero.skydemon.skydemonandroid/divelements.skydemon.Activity1 t9}: app died, no saved state
02-02 14:58:21.312: D/AndroidRuntime(23295): >>>>>> AndroidRuntime START com.android.internal.os.RuntimeInit <<<<<<
02-02 14:58:21.317: D/AndroidRuntime(23295): CheckJNI is OFF
02-02 14:58:21.365: W/InputMethodManagerService(527): Got RemoteException sending setActive(false) notification to pid 23071 uid 10161
02-02 14:58:21.506: E/memtrack(23295): Couldn't load memtrack module (No such file or directory)
02-02 14:58:21.506: E/android.os.Debug(23295): failed to load memtrack module: -2
02-02 14:58:21.569: D/AndroidRuntime(23295): Calling main entry com.android.commands.am.Am
02-02 14:58:21.576: I/ActivityManager(527): Force stopping aero.skydemon.skydemonandroid appid=10161 user=0: from pid 23295
02-02 14:58:21.577: D/AndroidRuntime(23295): Shutting down VM
02-02 14:58:21.579: I/art(23295): Debugger is no longer active
02-02 14:58:21.891: W/OpenGLRenderer(2201): Incorrectly called buildLayer on View: ShortcutAndWidgetContainer, destroying layer...
02-02 14:58:21.891: W/OpenGLRenderer(2201): Incorrectly called buildLayer on View: ShortcutAndWidgetContainer, destroying layer...
Comment 17 Tim Dawson 2015-02-02 12:03:08 UTC
I don't actually know whether the two are related, but the debug build (as mentioned above) crashes almost immediately and the crash is very easy to reproduce. The release (beta) build crashes much less often but still frequently enough to be a huge nuisance to our pilots.

If you can give me instructions as to how to produce a debug build for you (I've only ever packaged up a Release build for distribution) I am happy to send it to you.

Tim
Comment 18 Jonathan Pryor 2015-02-03 19:29:17 UTC
By any chance can you please try one of the following:

1. Build with Xamarin.Android 5.0, or
2. Run the following adb command against your Android 5.0 target:

    adb shell setprop debug.mono.wref jni

I think this is an Android bug with java.lang.ref.WeakReference and JNI Global References...

Because ART had issues with JNI Weak References [0], we changed the default behavior when running under ART to use the same GC bridge mechanism as was used prior to Android v2.2: java.lang.ref.WeakReference instances. That appeared to work for out tests.

Xamarin.Android 5.0 has an improved check, and will use JNIEnv::NewWeakGlobalRef() instead of java.lang.ref.WeakReference instances on Android 5.0 (because ART fixed the weak reference bug).

Alternatively, setting the debug.mono.wref system property will allow explicitly overriding the default value, which should cause your app to opt-in to using JNIEnv::NewWeakGlobalRef() on Android 5.0.

Do either of these allow your app to run more stably?

---

Discussion:

Comment #6 mentioned logcat-parse seeing over 53000 "unknown" references -- handles for which no JNI type information could be found. So I worked on improving logcat-parse, on the assumption that surely these handles have some type information, and I just need to improve deduction from the available information.

While deduction can be improved, I don't think that this is the problem here.

From my 1G+ logcat run, let's grab a handle at random:

> csharp> var u = grefs.AlivePeers.Where(p => p.JniType=="").ElementAt(50000);
> csharp> u.Handles;
> { "0x64028b6", "0x460301e" }

So, where does the first handle come from?

> $ grep 0x64028b6 skydemon.txt 
> I/monodroid-gref(11718): +w+ grefc 2887 gwrefc 2977 obj-handle 0x64028b6/G -> new-handle 0x460301e/G from thread 'finalizer'(12906)
> I/monodroid-gref(11718): -g- grefc 2887 gwrefc 2977 handle 0x64028b6/G from thread 'finalizer'(12906)

In English, the first +w+ line means "we're creating a JNI global reference from JNI handle 0x64028b6, and creating a new weak global reference with value 0x460301e."

But where did 0x64028b6 come from? Uh....I don't know.

Worse, as per our logging code there *should* be a corresponding "*take_weak_2_1" message with the same handle values (there's some duplication in our logging).

Except there isn't such a take_weak_2_1 message. It's "missing".

For example, the immediately preceding log is:

> I/monodroid-gref(11718): *take_weak_2_1 obj=0x9ab75410 -> wref=0x18052d6 handle=0x450301e
> I/monodroid-gref(11718): take_weak_global_ref_2_1_compat
> I/monodroid-gref(11718): -g- grefc 2888 gwrefc 2976 handle 0x450301e/G from thread 'finalizer'(12906)

We have a take_weak_2_1 message for 0x450301e being referenced by 0x18052d6, and then 0x450301e is deleted.

The greater context, merging things together:

> I/monodroid-gref(11718): *take_weak_2_1 obj=0x9ab75410 -> wref=0x18052d6 handle=0x450301e
> I/monodroid-gref(11718): take_weak_global_ref_2_1_compat
> I/monodroid-gref(11718): -g- grefc 2888 gwrefc 2976 handle 0x450301e/G from thread 'finalizer'(12906)
> I/monodroid-gref(11718): +w+ grefc 2887 gwrefc 2977 obj-handle 0x64028b6/G -> new-handle 0x460301e/G from thread 'finalizer'(12906)
> I/monodroid-gref(11718): -g- grefc 2887 gwrefc 2977 handle 0x64028b6/G from thread 'finalizer'(12906)

Now, the scary bit: remember way back when ART first came out, and everything broke? http://forums.xamarin.com/discussion/comment/38456/#Comment_38456

Useful quote:

> within ART, a JNI handle is a 32-bit value which contains three parts: 
> a serial number, a table index, and the handle type (see ToIndirectRef()
> on indirect_reference_table.h line 337). This means there's structure to the handle

Looking at the last 6 digits of the handle values, what do we get? 50301e, which appears multiple times!

> I/monodroid-gref(11718): -g- grefc 2888 gwrefc 2976 handle 0x450301e/G from thread 'finalizer'(12906)
> I/monodroid-gref(11718): +w+ grefc 2887 gwrefc 2977 obj-handle 0x64028b6/G -> new-handle 0x460301e/G from thread 'finalizer'(12906)

0x450301e and 0x460301e should be the *same* JNI handle value, yet we're seeing it in two different places.

How does that happen?! What does it mean?

Building upon that observation, while there is no preceding use of handle value 0x64028b6, there are several uses of the same JNI handle index 4028b6 [1]. The initial +g+, Dispose, and -g- makes sense. After that, it doesn't make sense.

In theory, for every +g+ with a new-handle HANDLE value, there should be a corresponding -g- for the same handle value. +w+ and -w- should behave the same way. Yet this isn't what we see, because we see plenty of +w+'s, but no -w-s; instead, they're -g-s. Similarly, after the initial creation, there are more -g-s than +g+s. 

Things are mismatched; things are inconsistent.

[0]: https://code.google.com/p/android/issues/detail?id=63929

[1]: $ grep 4028b6 skydemon.txt 
> I/monodroid-gref(11718): +g+ grefc 2316 gwrefc 0 obj-handle 0xff50001d/L -> new-handle 0x4028b6/G from thread '(null)'(1)
> I/monodroid-gref(11718): handle 0x4028b6; key_handle 0x34f94457: Java Type: `android/graphics/DashPathEffect`; MCW type: `Android.Graphics.DashPathEffect`
> I/monodroid-gref(11718): Disposing handle 0x4028b6
> I/monodroid-gref(11718): -g- grefc 2316 gwrefc 0 handle 0x4028b6/G from thread '(null)'(1)
> I/monodroid-gref(11718): *take_weak_2_1 obj=0x9ab6f288 -> wref=0x1c01426 handle=0x14028b6
> I/monodroid-gref(11718): +w+ grefc 680 gwrefc 1775 obj-handle 0x14028b6/G -> new-handle 0x1c01426/G from thread 'finalizer'(11718)
> I/monodroid-gref(11718): -g- grefc 680 gwrefc 1775 handle 0x14028b6/G from thread 'finalizer'(11718)
> I/monodroid-gref(11718): +w+ grefc 509 gwrefc 2591 obj-handle 0x34028b6/G -> new-handle 0x8c01c3e/G from thread 'finalizer'(11718)
> I/monodroid-gref(11718): -g- grefc 509 gwrefc 2591 handle 0x34028b6/G from thread 'finalizer'(11718)
> I/monodroid-gref(11718): +w+ grefc 581 gwrefc 4207 obj-handle 0x54028b6/G -> new-handle 0xac01c3e/G from thread 'finalizer'(11718)
> I/monodroid-gref(11718): -g- grefc 581 gwrefc 4207 handle 0x54028b6/G from thread 'finalizer'(11718)
> I/monodroid-gref(11718): +w+ grefc 2887 gwrefc 2977 obj-handle 0x64028b6/G -> new-handle 0x460301e/G from thread 'finalizer'(12906)
> I/monodroid-gref(11718): -g- grefc 2887 gwrefc 2977 handle 0x64028b6/G from thread 'finalizer'(12906)
> I/monodroid-gref(11718): +g+ grefc 6271 gwrefc 4356 obj-handle 0xd09ce2/G -> new-handle 0xd4028b6/G from thread 'finalizer'(11718)
> I/monodroid-gref(11718): *take_weak_2_1 obj=0x9b0ca5b8 -> wref=0x1a0a8f2 handle=0xe4028b6
> I/monodroid-gref(11718): +w+ grefc 5592 gwrefc 6108 obj-handle 0xe4028b6/G -> new-handle 0x1a0a8f2/G from thread 'finalizer'(11718)
> I/monodroid-gref(11718): -g- grefc 5592 gwrefc 6108 handle 0xe4028b6/G from thread 'finalizer'(11718)
> I/monodroid-gref(11718): *take_weak_2_1 obj=0x9b0e9d98 -> wref=0x2a0a8f2 handle=0xf4028b6
> I/monodroid-gref(11718): +w+ grefc 6057 gwrefc 6656 obj-handle 0xf4028b6/G -> new-handle 0x2a0a8f2/G from thread 'finalizer'(11718)
> I/monodroid-gref(11718): *take_weak_2_1 obj=0x9c3799d8 -> wref=0x12d00852 handle=0x114028b6
> I/monodroid-gref(11718): +w+ grefc 8643 gwrefc 6027 obj-handle 0x114028b6/G -> new-handle 0x12d00852/G from thread 'finalizer'(11718)
> I/monodroid-gref(11718): -g- grefc 8643 gwrefc 6027 handle 0x114028b6/G from thread 'finalizer'(11718)
> I/monodroid-gref(11718): *take_weak_2_1 obj=0x9de30a60 -> wref=0x13d00852 handle=0x124028b6
> I/monodroid-gref(11718): +w+ grefc 9047 gwrefc 6315 obj-handle 0x124028b6/G -> new-handle 0x13d00852/G from thread 'finalizer'(11718)
> I/monodroid-gref(11718): -g- grefc 9047 gwrefc 6315 handle 0x124028b6/G from thread 'finalizer'(11718)
Comment 19 Tim Dawson 2015-02-04 05:49:10 UTC
I'd like to try setting the environment variable on a device but since our own device isn't experiencing the same crashes as our users are reporting (yet) I can't do it. I will try again later because it was crashing a lot with debug builds, but that may be unrelated.

I wasn't aware there was a Xamarin.Android 5.0. I haven't been offered it. I just tried changing to Beta/Alpha in Visual Studio and checking for updates and I'm offered "Xamarin 3.9" and I've no idea whether that includes the "Xamarin.Android 5.0" you're talking about. As an aside, your versioning system is rather confusing.

I'd like to be able to send a build to our testers using the tools you mentioned but you'll need to please tell me where I download them, and how I can revert to stable tools afterwards.

-

To continue your discussion. The beta I asked you to try before had a caching mechanism which saved instances of drawing primitives (mainly Paths) until a set point in the future to avoid recreating them, for performance reasons. This probably explains the many thousands of them you were seeing stick around. Since then I've determined the performance benefit was pretty small compared to the obvious risk of keeping the things around (I'm just not convinced mono is good at long-lived objects) so I've removed the caching mechanism. All drawing objects used in map rendering (the vast majority of work in our app) are now created and destroyed during the paint cycle.

If you therefore do another test using the latest beta (same URL) you should no longer see as many references building up. When I made this change our testers reported an increase in stability. They're still seeing the same crashes, but fewer of them. It seems like the more objects you have sitting around in memory, the more chance that one of them will be silently boobytrapped by ART/mono/whatever.
Comment 20 Tim Dawson 2015-02-04 05:56:40 UTC
Alternatively, is there any way we can produce a build from our current tools which forces the new weak reference behaviour?
Comment 21 Jonathan Pryor 2015-02-04 09:03:19 UTC
Regarding Xamarin.Android 5.0, it's not a "stable" release; it's a preview-only release:

http://developer.xamarin.com/guides/android/wear/installation/
http://developer.xamarin.com/releases/android/xamarin.android_5/xamarin.android_5.0/

> Alternatively, is there any way we can produce a build from our current tools
> which forces the new weak reference behaviour?

Yes:

http://developer.xamarin.com/guides/android/advanced_topics/environment/

You can add a file to your project with a Build action of AndroidEnvironment, with the contents:

debug.mono.wref=jni

This will force just of JNIEnv::NewWeakGlobalRef() everywhere. Unfortunately, this means it'll be used on Android 4.4 w/ ART enabled, where it's known to not work (see previously mentioned Android bug), so you wouldn't want to push this to the Play Store, but for one-off testing builds it should be fine.
Comment 22 Tim Dawson 2015-02-04 12:31:21 UTC
That's great, thanks. This will only be used by beta testers running Lollipop so I will get that produced and sent to them tomorrow and get back to you.
Comment 23 Jonathan Pryor 2015-02-04 21:18:34 UTC
More investigation, and I have a possible explanation for the bizarro things I saw in Comment #18:

`adb logcat` appears to be unreliable, e.g.:

https://gist.github.com/jonpryor/d59a0fc24c812249168d

TL;DR: while trying to create a repro ("surely this is an Android bug, right?"), I found that if I dump "too much" data to `adb logcat` on my Nexus 5 with Android 5.0.2, not everything that my app writes to logcat is visible in logcat. Messages are "lost", undelivered.

Which could explain why JNI handle values would "come out of nowhere" (e.g. 0x64028b6) -- where they came from was skipped, lost in the aether.

Unfortunately, if this is actually the case -- and current signs suggest it is -- then the current GREF logging code is also unreliable, as it is built upon logcat, which in turn means I need to fix GREF logging. :-/
Comment 24 Tim Dawson 2015-02-05 06:35:07 UTC
I have created a build with the debug.mono.wref=jni environment variable set. I note that we are already setting MONO_GC_PARAMS=bridge-implementation=tarjan in the same file; please let me know if that will cause any conflicts.

I will report back with what our testers say. Hopefully they'll say it improves matters.
Comment 25 Tim Dawson 2015-02-05 12:25:17 UTC
Unfortunately it appears that even with the environment variable set, the app still crashes. Here's the logcat output:

[ 02-05 13:38:29.133  3276: 3276 E/art      ]

JNI ERROR (app bug): accessed deleted global reference 0xb01432



[ 02-05 13:38:29.133  3276: 3276 F/art      ]

art/runtime/indirect_reference_table.cc:61] JNI ERROR (app bug): see above.
Comment 26 Jonathan Pryor 2015-02-05 12:47:42 UTC
I'd ask for GREF logs, but there's no point when `adb logcat` won't log everything...

I need to implement an alternate handle logging mechanism, something that will log to a file instead of logcat. That should be more reliable, and allow me to figure out what's going on.
Comment 27 Tim Dawson 2015-02-05 14:28:00 UTC
Does it log everything when it's started before the app is started?

We have at least one customers with the debugging tools installed and for whom the crash invariably happens very soon after launching the app. So if I can produce a build with GREF logging on and give it to him, and you think it would be helpful, I can certainly do that.
Comment 28 Jonathan Pryor 2015-02-05 14:40:57 UTC
> Does it log everything when it's started before the app is started?

I don't understand the question.

> So if I can produce a build with GREF logging on and give it to him,
> and you think it would be helpful, I can certainly do that.

It might be helpful. It might not be.

As per Comment #23, `adb logcat` is unreliable on Android 5.0, and "skips" messages if there are lots of them generated...like there are during a GC, which is when we most need them.

If his crash happens soon after the app starts, them presumably there will be fewer GREFs in use, and thus presumably -- *hopefully*! -- logcat won't lose anything, and things will make sense.

But if logcat skips anything...it won't help.

I'm working on revamping the GREF logging code to write to a file instead of logcat. I think it may be saner to wait until that's complete, provide you a build of that, and rebuild with it. That should provide a more reliable logging mechanism.
Comment 29 Tim Dawson 2015-02-06 06:02:09 UTC
My question was whether logcat only "misses" log entries if you start logcat after your app has run (so it's reaching back in time for past log entries) or if it misses them even if it's already running before you start your app. It sounds like this is a detail that isn't very relevant anyway.

If you're already working on that logging mechanism and can have a build to me sometime in the next few days then I'm happy to await that as it sounds like it'll be the most robust method of testing. Otherwise I'm happy to have my tester produce a build with logcat GREF output and we can hope for the best.
Comment 30 Tim Dawson 2015-02-12 07:08:29 UTC
Can I have an update on this please? We are obviously keen to press ahead with any diagnostics we can do to help you get to the bottom of this bug.
Comment 31 Tim Dawson 2015-02-13 05:31:37 UTC
Thanks for the email with the hotfix build. I am currently downloading it and will produce a build for our testers right away (assuming it works).

And not to worry, the output files shouldn't get too large. Perhaps I haven't stated the seriousness of this issue sufficiently: our app crashes within minutes or seconds, for many of our customers. We'll get the log in no time.
Comment 32 Jonathan Pryor 2015-02-13 06:27:30 UTC
@Tim: I forgot to mention, we'll still need the `adb logcat` output so that we get the error message about which invalid JNI handle was accessed. We don't need all output, just the end with the ART crash information and debuggerd output.
Comment 33 Tim Dawson 2015-02-13 07:02:17 UTC
After installing your hotfix I couldn't build; it said I needed Android NDK installed. I found out what that was and installed it, and managed to get the project to build (I tried a Release build). I then did a Start without Debugging from Visual Studio, but the app will no longer start on our device (Nexus 7 1st gen). Here's the logcat output when trying to start it:

02-13 11:59:18.794: I/ActivityManager(479): START u0 {act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10000000 cmp=aero.skydemon.skydemonandroid/md5ad594e2b1091d5d0d7f032db3667fc81.SplashActivity (has extras)} from uid 10031 on display 0
02-13 11:59:18.871: I/ActivityManager(479): Start proc aero.skydemon.skydemonandroid for activity aero.skydemon.skydemonandroid/md5ad594e2b1091d5d0d7f032db3667fc81.SplashActivity: pid=23761 uid=10165 gids={50165, 9997, 1028, 1015, 3003} abi=armeabi-v7a
02-13 11:59:19.012: W/monodroid-gref(23761): *jonp* open_file: path=/data/data/aero.skydemon.skydemonandroid/files/.__override__/grefs.txt
02-13 11:59:19.014: W/monodroid-gref(23761): *jonp* open_file: f=0x40177688
02-13 11:59:19.016: W/monodroid(23761): Trying to load sgen from: /data/app/aero.skydemon.skydemonandroid-1/lib/arm/libmonosgen-2.0.so
02-13 11:59:19.018: W/(23761): counters not enabled
02-13 11:59:19.018: W/(23761): counters not enabled
02-13 11:59:19.018: W/(23761): counters not enabled
02-13 11:59:19.018: W/(23761): counters not enabled
02-13 11:59:19.089: W/monodroid-gc(23761): GREF GC Threshold: 46080
02-13 11:59:19.241: I/monodroid-gref(23761): +g+ grefc 1 gwrefc 0 obj-handle 0x400055/L -> new-handle 0x1003de/G from thread '(null)'(1)
02-13 11:59:19.385: I/monodroid-gref(23761): +g+ grefc 2 gwrefc 0 obj-handle 0x500059/L -> new-handle 0x1003e2/G from thread '(null)'(1)
02-13 11:59:19.393: I/monodroid-gref(23761): +g+ grefc 3 gwrefc 0 obj-handle 0x600051/L -> new-handle 0x1003e6/G from thread '(null)'(1)
02-13 11:59:19.420: I/monodroid-gref(23761): +g+ grefc 4 gwrefc 0 obj-handle 0x800051/L -> new-handle 0x1003ea/G from thread '(null)'(1)
02-13 11:59:19.585: I/monodroid-gref(23761): +g+ grefc 5 gwrefc 0 obj-handle 0xa00055/L -> new-handle 0x1003ee/G from thread '(null)'(1)
02-13 11:59:19.707: I/monodroid-gref(23761): -g- grefc 5 gwrefc 0 handle 0x1003ee/G from thread '(null)'(1)
02-13 11:59:19.708: I/monodroid-gref(23761): +g+ grefc 5 gwrefc 0 obj-handle 0xc00051/L -> new-handle 0x2003ee/G from thread '(null)'(1)
02-13 11:59:19.735: A/monodroid-assembly(23761): Could not load assembly 'SkyDemon' during startup registration.
02-13 11:59:19.735: A/monodroid-assembly(23761): This might be due to an invalid debug instalation.
02-13 11:59:19.735: A/monodroid-assembly(23761): A common cause is to 'adb install' the app directly instead of doing from the IDE.
02-13 11:59:19.844: I/Zygote(142): Process 23761 exited cleanly (13)
02-13 11:59:19.913: I/ActivityManager(479): Process aero.skydemon.skydemonandroid (pid 23761) has died
02-13 11:59:19.944: I/ActivityManager(479): Start proc aero.skydemon.skydemonandroid for activity aero.skydemon.skydemonandroid/md5ad594e2b1091d5d0d7f032db3667fc81.SplashActivity: pid=23780 uid=10165 gids={50165, 9997, 1028, 1015, 3003} abi=armeabi-v7a
02-13 11:59:20.022: W/monodroid-gref(23780): *jonp* open_file: path=/data/data/aero.skydemon.skydemonandroid/files/.__override__/grefs.txt
02-13 11:59:20.022: W/monodroid-gref(23780): *jonp* open_file: f=0x40177688
02-13 11:59:20.022: W/monodroid(23780): Trying to load sgen from: /data/app/aero.skydemon.skydemonandroid-1/lib/arm/libmonosgen-2.0.so
02-13 11:59:20.025: W/(23780): counters not enabled
02-13 11:59:20.025: W/(23780): counters not enabled
02-13 11:59:20.025: W/(23780): counters not enabled
02-13 11:59:20.025: W/(23780): counters not enabled
02-13 11:59:20.068: W/monodroid-gc(23780): GREF GC Threshold: 46080
02-13 11:59:20.148: I/monodroid-gref(23780): +g+ grefc 1 gwrefc 0 obj-handle 0x400055/L -> new-handle 0x1003de/G from thread '(null)'(1)
02-13 11:59:20.260: I/monodroid-gref(23780): +g+ grefc 2 gwrefc 0 obj-handle 0x500059/L -> new-handle 0x1003e2/G from thread '(null)'(1)
02-13 11:59:20.269: I/monodroid-gref(23780): +g+ grefc 3 gwrefc 0 obj-handle 0x600051/L -> new-handle 0x1003e6/G from thread '(null)'(1)
02-13 11:59:20.296: I/monodroid-gref(23780): +g+ grefc 4 gwrefc 0 obj-handle 0x800051/L -> new-handle 0x1003ea/G from thread '(null)'(1)
02-13 11:59:20.458: I/monodroid-gref(23780): +g+ grefc 5 gwrefc 0 obj-handle 0xa00055/L -> new-handle 0x1003ee/G from thread '(null)'(1)
02-13 11:59:20.576: I/monodroid-gref(23780): -g- grefc 5 gwrefc 0 handle 0x1003ee/G from thread '(null)'(1)
02-13 11:59:20.578: I/monodroid-gref(23780): +g+ grefc 5 gwrefc 0 obj-handle 0xc00051/L -> new-handle 0x2003ee/G from thread '(null)'(1)
02-13 11:59:20.603: A/monodroid-assembly(23780): Could not load assembly 'SkyDemon' during startup registration.
02-13 11:59:20.603: A/monodroid-assembly(23780): This might be due to an invalid debug instalation.
02-13 11:59:20.603: A/monodroid-assembly(23780): A common cause is to 'adb install' the app directly instead of doing from the IDE.
02-13 11:59:20.664: I/Zygote(142): Process 23780 exited cleanly (13)
02-13 11:59:20.773: I/ActivityManager(479): Process aero.skydemon.skydemonandroid (pid 23780) has died
02-13 11:59:20.803: I/ActivityManager(479): Start proc aero.skydemon.skydemonandroid for activity aero.skydemon.skydemonandroid/md5ad594e2b1091d5d0d7f032db3667fc81.SplashActivity: pid=23798 uid=10165 gids={50165, 9997, 1028, 1015, 3003} abi=armeabi-v7a
02-13 11:59:20.900: W/monodroid-gref(23798): *jonp* open_file: path=/data/data/aero.skydemon.skydemonandroid/files/.__override__/grefs.txt
02-13 11:59:20.900: W/monodroid-gref(23798): *jonp* open_file: f=0x40177688
02-13 11:59:20.900: W/monodroid(23798): Trying to load sgen from: /data/app/aero.skydemon.skydemonandroid-1/lib/arm/libmonosgen-2.0.so
02-13 11:59:20.902: W/(23798): counters not enabled
02-13 11:59:20.902: W/(23798): counters not enabled
02-13 11:59:20.902: W/(23798): counters not enabled
02-13 11:59:20.902: W/(23798): counters not enabled
02-13 11:59:20.944: W/monodroid-gc(23798): GREF GC Threshold: 46080
02-13 11:59:21.022: I/monodroid-gref(23798): +g+ grefc 1 gwrefc 0 obj-handle 0x400055/L -> new-handle 0x1003de/G from thread '(null)'(1)
02-13 11:59:21.134: I/monodroid-gref(23798): +g+ grefc 2 gwrefc 0 obj-handle 0x500059/L -> new-handle 0x1003e2/G from thread '(null)'(1)
02-13 11:59:21.142: I/monodroid-gref(23798): +g+ grefc 3 gwrefc 0 obj-handle 0x600051/L -> new-handle 0x1003e6/G from thread '(null)'(1)
02-13 11:59:21.167: I/monodroid-gref(23798): +g+ grefc 4 gwrefc 0 obj-handle 0x800051/L -> new-handle 0x1003ea/G from thread '(null)'(1)
02-13 11:59:21.328: I/monodroid-gref(23798): +g+ grefc 5 gwrefc 0 obj-handle 0xa00055/L -> new-handle 0x1003ee/G from thread '(null)'(1)
02-13 11:59:21.444: I/monodroid-gref(23798): -g- grefc 5 gwrefc 0 handle 0x1003ee/G from thread '(null)'(1)
02-13 11:59:21.446: I/monodroid-gref(23798): +g+ grefc 5 gwrefc 0 obj-handle 0xc00051/L -> new-handle 0x2003ee/G from thread '(null)'(1)
02-13 11:59:21.471: A/monodroid-assembly(23798): Could not load assembly 'SkyDemon' during startup registration.
02-13 11:59:21.471: A/monodroid-assembly(23798): This might be due to an invalid debug instalation.
02-13 11:59:21.471: A/monodroid-assembly(23798): A common cause is to 'adb install' the app directly instead of doing from the IDE.
02-13 11:59:21.514: I/Zygote(142): Process 23798 exited cleanly (13)
02-13 11:59:21.563: I/ActivityManager(479): Process aero.skydemon.skydemonandroid (pid 23798) has died
02-13 11:59:21.564: W/ActivityManager(479): Force removing ActivityRecord{5068053 u0 aero.skydemon.skydemonandroid/md5ad594e2b1091d5d0d7f032db3667fc81.SplashActivity t33}: app died, no saved state
02-13 11:59:21.575: W/WindowManager(479): Failed looking up window
02-13 11:59:21.575: W/WindowManager(479): java.lang.IllegalArgumentException: Requested window android.view.ViewRootImpl$W@2360d68e does not exist
02-13 11:59:21.575: W/WindowManager(479): 	at com.android.server.wm.WindowManagerService.windowForClientLocked(WindowManagerService.java:8412)
02-13 11:59:21.575: W/WindowManager(479): 	at com.android.server.wm.WindowManagerService.windowForClientLocked(WindowManagerService.java:8403)
02-13 11:59:21.575: W/WindowManager(479): 	at com.android.server.wm.WindowManagerService.removeWindow(WindowManagerService.java:2558)
02-13 11:59:21.575: W/WindowManager(479): 	at com.android.server.wm.Session.remove(Session.java:186)
02-13 11:59:21.575: W/WindowManager(479): 	at android.view.ViewRootImpl.dispatchDetachedFromWindow(ViewRootImpl.java:2920)
02-13 11:59:21.575: W/WindowManager(479): 	at android.view.ViewRootImpl.doDie(ViewRootImpl.java:5390)
02-13 11:59:21.575: W/WindowManager(479): 	at android.view.ViewRootImpl$ViewRootHandler.handleMessage(ViewRootImpl.java:3223)
02-13 11:59:21.575: W/WindowManager(479): 	at android.os.Handler.dispatchMessage(Handler.java:102)
02-13 11:59:21.575: W/WindowManager(479): 	at android.os.Looper.loop(Looper.java:135)
02-13 11:59:21.575: W/WindowManager(479): 	at android.os.HandlerThread.run(HandlerThread.java:61)
02-13 11:59:21.575: W/WindowManager(479): 	at com.android.server.ServiceThread.run(ServiceThread.java:46)
02-13 11:59:21.647: W/InputMethodManagerService(479): Window already focused, ignoring focus gain of: com.android.internal.view.IInputMethodClient$Stub$Proxy@b622245 attribute=null, token = android.os.BinderProxy@3c693b7b
02-13 11:59:22.168: W/OpenGLRenderer(1114): Incorrectly called buildLayer on View: ShortcutAndWidgetContainer, destroying layer...
02-13 11:59:26.625: E/WifiStateMachine(479): WifiStateMachine CMD_START_SCAN source -2 txSuccessRate=0.00 rxSuccessRate=0.01 targetRoamBSSID=00:50:7f:5e:ff:88 RSSI=-58
02-13 11:59:26.626: E/WifiStateMachine(479): WifiStateMachine starting scan for "SkyDemon-DrayTek"WPA_PSK with 2437
02-13 11:59:26.634: I/wpa_supplicant(707): wlan0: CTRL-EVENT-SCAN-STARTED
Comment 34 Jonathan Pryor 2015-02-13 08:44:00 UTC
@Tim:

> it said I needed Android NDK installed. 

That's odd; the NDK shouldn't be needed unless it's actually used. :-(

> [I] managed to get the project to build (I tried a Release build)...

The cause of death is:

> 02-13 11:59:19.735: A/monodroid-assembly(23761): Could not load assembly 'SkyDemon' during startup registration.
> 02-13 11:59:19.735: A/monodroid-assembly(23761): This might be due to an invalid debug instalation.
> 02-13 11:59:19.735: A/monodroid-assembly(23761): A common cause is to 'adb install' the app directly instead of doing from the IDE.

That looks like it's not a Release build, or -- somehow -- the Use shared Mono runtime and Fast assembly deployment options were enabled for your Release build.

In short, something is screwy (The joys of alpha software, though I haven't seen this before...). :-(

Could you please try this command from a Visual Studio command prompt?

    MSBuild /t:Install /p:Configuration=Release /v:diag > build.txt

then attach build.txt?
Comment 36 Jonathan Pryor 2015-02-13 11:56:48 UTC
> Was I supposed to specify our Android app on the command line? If so, this happens

My apologies; I should have been more specific. MSBuild requires that a solution or project file be specified on the command line, and the Install target requires a .csproj, not a .sln:


> MSBuild /t:Install /p:Configuration=Release /v:diag Path\To\Your\App.csproj > build.txt
Comment 38 Tim Dawson 2015-02-14 12:18:34 UTC
I have run that as requested and uploaded the result as an attachment (private).
Comment 39 Jonathan Pryor 2015-02-15 21:13:44 UTC
Does the installed app work?

The build log shows:

>  InstallPackageAssemblies Task (TaskId:318)
...
>     EmbedAssemblies: True (TaskId:318)
...
>     UseSharedRuntime: False (TaskId:318)

so the assemblies *should* be embedded into the .apk. (For example, if you rename the .apk to a .zip, you should be able to use Windows Explorer to view within the .apk, and there should be an "assemblies" folder which contains the assemblies.)

Furthermore, installation succeeded:

>   TASK: InstallPackage pm install "/data/local/tmp/aero.skydemon.skydemonandroid-Signed.apk" completed?:        pkg: /data/local/tmp/aero.skydemon.skydemonandroid-Signed.apk
>   Success

so as far as I can tell, the app should work. :-(
Comment 40 Tim Dawson 2015-02-16 11:33:38 UTC
No, nothing built with the hotfix version you made available to me will run. The app begins to launch but immediately dies before any managed code is run. It's pretty much the same output in logcat:

02-16 16:31:45.810: I/art(1114): Background sticky concurrent mark sweep GC freed 46893(2MB) AllocSpace objects, 10(265KB) LOS objects, 4% free, 16MB/17MB, paused 1.914ms total 108.067ms
02-16 16:31:45.928: I/art(1114): Background partial concurrent mark sweep GC freed 25145(1840KB) AllocSpace objects, 9(392KB) LOS objects, 40% free, 15MB/25MB, paused 5.617ms total 77.006ms
02-16 16:31:48.631: I/ActivityManager(479): START u0 {act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10000000 cmp=aero.skydemon.skydemonandroid/md5c98e1e2f620028c1a15b809310f1310a.SplashActivity (has extras)} from uid 10031 on display 0
02-16 16:31:48.720: I/ActivityManager(479): Start proc aero.skydemon.skydemonandroid for activity aero.skydemon.skydemonandroid/md5c98e1e2f620028c1a15b809310f1310a.SplashActivity: pid=28372 uid=10165 gids={50165, 9997, 1028, 1015, 3003} abi=armeabi-v7a
02-16 16:31:48.754: I/art(142): Explicit concurrent mark sweep GC freed 707(30KB) AllocSpace objects, 0(0B) LOS objects, 40% free, 5MB/8MB, paused 344us total 36.235ms
02-16 16:31:48.778: I/art(142): Explicit concurrent mark sweep GC freed 7(240B) AllocSpace objects, 0(0B) LOS objects, 39% free, 5MB/8MB, paused 260us total 17.064ms
02-16 16:31:48.805: I/art(142): Explicit concurrent mark sweep GC freed 5(160B) AllocSpace objects, 0(0B) LOS objects, 39% free, 5MB/8MB, paused 266us total 20.218ms
02-16 16:31:48.914: W/monodroid-gref(28372): *jonp* open_file: path=/data/data/aero.skydemon.skydemonandroid/files/.__override__/grefs.txt
02-16 16:31:48.915: W/monodroid-gref(28372): *jonp* open_file: f=0x40177688
02-16 16:31:48.915: W/monodroid(28372): Trying to load sgen from: /data/app/aero.skydemon.skydemonandroid-1/lib/arm/libmonosgen-2.0.so
02-16 16:31:48.917: W/(28372): counters not enabled
02-16 16:31:48.917: W/(28372): counters not enabled
02-16 16:31:48.917: W/(28372): counters not enabled
02-16 16:31:48.917: W/(28372): counters not enabled
02-16 16:31:48.970: W/monodroid-gc(28372): GREF GC Threshold: 46080
02-16 16:31:49.133: I/monodroid-gref(28372): +g+ grefc 1 gwrefc 0 obj-handle 0x400055/L -> new-handle 0x1003de/G from thread '(null)'(1)
02-16 16:31:49.274: I/monodroid-gref(28372): +g+ grefc 2 gwrefc 0 obj-handle 0x500059/L -> new-handle 0x1003e2/G from thread '(null)'(1)
02-16 16:31:49.282: I/monodroid-gref(28372): +g+ grefc 3 gwrefc 0 obj-handle 0x600051/L -> new-handle 0x1003e6/G from thread '(null)'(1)
02-16 16:31:49.310: I/monodroid-gref(28372): +g+ grefc 4 gwrefc 0 obj-handle 0x800051/L -> new-handle 0x1003ea/G from thread '(null)'(1)
02-16 16:31:49.474: I/monodroid-gref(28372): +g+ grefc 5 gwrefc 0 obj-handle 0xa00055/L -> new-handle 0x1003ee/G from thread '(null)'(1)
02-16 16:31:49.598: I/monodroid-gref(28372): -g- grefc 5 gwrefc 0 handle 0x1003ee/G from thread '(null)'(1)
02-16 16:31:49.600: I/monodroid-gref(28372): +g+ grefc 5 gwrefc 0 obj-handle 0xc00051/L -> new-handle 0x2003ee/G from thread '(null)'(1)
02-16 16:31:49.630: A/monodroid-assembly(28372): Could not load assembly 'SkyDemon' during startup registration.
02-16 16:31:49.630: A/monodroid-assembly(28372): This might be due to an invalid debug instalation.
02-16 16:31:49.630: A/monodroid-assembly(28372): A common cause is to 'adb install' the app directly instead of doing from the IDE.
02-16 16:31:49.690: I/Zygote(142): Process 28372 exited cleanly (13)
02-16 16:31:49.730: I/ActivityManager(479): Process aero.skydemon.skydemonandroid (pid 28372) has died
02-16 16:31:49.758: I/ActivityManager(479): Start proc aero.skydemon.skydemonandroid for activity aero.skydemon.skydemonandroid/md5c98e1e2f620028c1a15b809310f1310a.SplashActivity: pid=28390 uid=10165 gids={50165, 9997, 1028, 1015, 3003} abi=armeabi-v7a
02-16 16:31:49.843: E/WifiStateMachine(479): WifiStateMachine CMD_START_SCAN source -2 txSuccessRate=0.06 rxSuccessRate=0.68 targetRoamBSSID=00:50:7f:5e:ff:88 RSSI=-52
02-16 16:31:49.843: E/WifiStateMachine(479): WifiStateMachine starting scan for "SkyDemon-DrayTek"WPA_PSK with 2437
02-16 16:31:49.851: I/wpa_supplicant(707): wlan0: CTRL-EVENT-SCAN-STARTED 
02-16 16:31:49.863: W/monodroid-gref(28390): *jonp* open_file: path=/data/data/aero.skydemon.skydemonandroid/files/.__override__/grefs.txt
02-16 16:31:49.863: W/monodroid-gref(28390): *jonp* open_file: f=0x40177688
02-16 16:31:49.863: W/monodroid(28390): Trying to load sgen from: /data/app/aero.skydemon.skydemonandroid-1/lib/arm/libmonosgen-2.0.so
02-16 16:31:49.865: W/(28390): counters not enabled
02-16 16:31:49.865: W/(28390): counters not enabled
02-16 16:31:49.865: W/(28390): counters not enabled
02-16 16:31:49.865: W/(28390): counters not enabled
02-16 16:31:49.913: W/monodroid-gc(28390): GREF GC Threshold: 46080
02-16 16:31:49.993: I/monodroid-gref(28390): +g+ grefc 1 gwrefc 0 obj-handle 0x400055/L -> new-handle 0x1003de/G from thread '(null)'(1)
02-16 16:31:50.105: I/monodroid-gref(28390): +g+ grefc 2 gwrefc 0 obj-handle 0x500059/L -> new-handle 0x1003e2/G from thread '(null)'(1)
02-16 16:31:50.113: I/monodroid-gref(28390): +g+ grefc 3 gwrefc 0 obj-handle 0x600051/L -> new-handle 0x1003e6/G from thread '(null)'(1)
02-16 16:31:50.141: I/monodroid-gref(28390): +g+ grefc 4 gwrefc 0 obj-handle 0x800051/L -> new-handle 0x1003ea/G from thread '(null)'(1)
02-16 16:31:50.304: I/monodroid-gref(28390): +g+ grefc 5 gwrefc 0 obj-handle 0xa00055/L -> new-handle 0x1003ee/G from thread '(null)'(1)
02-16 16:31:50.424: I/monodroid-gref(28390): -g- grefc 5 gwrefc 0 handle 0x1003ee/G from thread '(null)'(1)
02-16 16:31:50.425: I/monodroid-gref(28390): +g+ grefc 5 gwrefc 0 obj-handle 0xc00051/L -> new-handle 0x2003ee/G from thread '(null)'(1)
02-16 16:31:50.452: A/monodroid-assembly(28390): Could not load assembly 'SkyDemon' during startup registration.
02-16 16:31:50.453: A/monodroid-assembly(28390): This might be due to an invalid debug instalation.
02-16 16:31:50.453: A/monodroid-assembly(28390): A common cause is to 'adb install' the app directly instead of doing from the IDE.
02-16 16:31:50.490: I/Zygote(142): Process 28390 exited cleanly (13)
02-16 16:31:50.560: I/ActivityManager(479): Process aero.skydemon.skydemonandroid (pid 28390) has died
02-16 16:31:50.598: I/ActivityManager(479): Start proc aero.skydemon.skydemonandroid for activity aero.skydemon.skydemonandroid/md5c98e1e2f620028c1a15b809310f1310a.SplashActivity: pid=28408 uid=10165 gids={50165, 9997, 1028, 1015, 3003} abi=armeabi-v7a
02-16 16:31:50.684: W/monodroid-gref(28408): *jonp* open_file: path=/data/data/aero.skydemon.skydemonandroid/files/.__override__/grefs.txt
02-16 16:31:50.684: W/monodroid-gref(28408): *jonp* open_file: f=0x40177688
02-16 16:31:50.684: W/monodroid(28408): Trying to load sgen from: /data/app/aero.skydemon.skydemonandroid-1/lib/arm/libmonosgen-2.0.so
02-16 16:31:50.686: W/(28408): counters not enabled
02-16 16:31:50.686: W/(28408): counters not enabled
02-16 16:31:50.686: W/(28408): counters not enabled
02-16 16:31:50.686: W/(28408): counters not enabled
02-16 16:31:50.731: W/monodroid-gc(28408): GREF GC Threshold: 46080
02-16 16:31:50.811: I/monodroid-gref(28408): +g+ grefc 1 gwrefc 0 obj-handle 0x400055/L -> new-handle 0x1003de/G from thread '(null)'(1)
02-16 16:31:50.926: I/monodroid-gref(28408): +g+ grefc 2 gwrefc 0 obj-handle 0x500059/L -> new-handle 0x1003e2/G from thread '(null)'(1)
02-16 16:31:50.934: I/monodroid-gref(28408): +g+ grefc 3 gwrefc 0 obj-handle 0x600051/L -> new-handle 0x1003e6/G from thread '(null)'(1)
02-16 16:31:50.962: I/monodroid-gref(28408): +g+ grefc 4 gwrefc 0 obj-handle 0x800051/L -> new-handle 0x1003ea/G from thread '(null)'(1)
02-16 16:31:51.127: I/monodroid-gref(28408): +g+ grefc 5 gwrefc 0 obj-handle 0xa00055/L -> new-handle 0x1003ee/G from thread '(null)'(1)
02-16 16:31:51.251: I/monodroid-gref(28408): -g- grefc 5 gwrefc 0 handle 0x1003ee/G from thread '(null)'(1)
02-16 16:31:51.253: I/monodroid-gref(28408): +g+ grefc 5 gwrefc 0 obj-handle 0xc00051/L -> new-handle 0x2003ee/G from thread '(null)'(1)
02-16 16:31:51.282: A/monodroid-assembly(28408): Could not load assembly 'SkyDemon' during startup registration.
02-16 16:31:51.282: A/monodroid-assembly(28408): This might be due to an invalid debug instalation.
02-16 16:31:51.282: A/monodroid-assembly(28408): A common cause is to 'adb install' the app directly instead of doing from the IDE.
02-16 16:31:51.410: I/Zygote(142): Process 28408 exited cleanly (13)
02-16 16:31:51.460: I/ActivityManager(479): Process aero.skydemon.skydemonandroid (pid 28408) has died
02-16 16:31:51.460: W/ActivityManager(479): Force removing ActivityRecord{1c0dd20e u0 aero.skydemon.skydemonandroid/md5c98e1e2f620028c1a15b809310f1310a.SplashActivity t41}: app died, no saved state
02-16 16:31:51.468: W/WindowManager(479): Failed looking up window
02-16 16:31:51.468: W/WindowManager(479): java.lang.IllegalArgumentException: Requested window android.view.ViewRootImpl$W@2ebfaa3c does not exist
02-16 16:31:51.468: W/WindowManager(479): 	at com.android.server.wm.WindowManagerService.windowForClientLocked(WindowManagerService.java:8412)
02-16 16:31:51.468: W/WindowManager(479): 	at com.android.server.wm.WindowManagerService.windowForClientLocked(WindowManagerService.java:8403)
02-16 16:31:51.468: W/WindowManager(479): 	at com.android.server.wm.WindowManagerService.removeWindow(WindowManagerService.java:2558)
02-16 16:31:51.468: W/WindowManager(479): 	at com.android.server.wm.Session.remove(Session.java:186)
02-16 16:31:51.468: W/WindowManager(479): 	at android.view.ViewRootImpl.dispatchDetachedFromWindow(ViewRootImpl.java:2920)
02-16 16:31:51.468: W/WindowManager(479): 	at android.view.ViewRootImpl.doDie(ViewRootImpl.java:5390)
02-16 16:31:51.468: W/WindowManager(479): 	at android.view.ViewRootImpl$ViewRootHandler.handleMessage(ViewRootImpl.java:3223)
02-16 16:31:51.468: W/WindowManager(479): 	at android.os.Handler.dispatchMessage(Handler.java:102)
02-16 16:31:51.468: W/WindowManager(479): 	at android.os.Looper.loop(Looper.java:135)
02-16 16:31:51.468: W/WindowManager(479): 	at android.os.HandlerThread.run(HandlerThread.java:61)
02-16 16:31:51.468: W/WindowManager(479): 	at com.android.server.ServiceThread.run(ServiceThread.java:46)
02-16 16:31:51.533: W/InputMethodManagerService(479): Window already focused, ignoring focus gain of: com.android.internal.view.IInputMethodClient$Stub$Proxy@bfb2e28 attribute=null, token = android.os.BinderProxy@3c693b7b
02-16 16:32:00.054: W/OpenGLRenderer(1114): Incorrectly called buildLayer on View: ShortcutAndWidgetContainer, destroying layer...
Comment 41 Tim Dawson 2015-02-18 06:14:00 UTC
I've uploaded an APK for you, so you can take a look.
Comment 43 Jonathan Pryor 2015-02-18 10:40:06 UTC
@Tim: Thank you for uploading the .apk.

It *does* contain assemblies:

> $ unzip -l aero.skydemon.skydemonandroid-Aligned.xa5.apk  | grep assemblies
>   2105856  02-18-15 10:48   assemblies/SkyDemon.dll
> ...

Furthermore, when I install and run it on my Nexus 5, it starts without error.

Unfortunately I'm not able to run the app, as my trial has expired.
Comment 44 Tim Dawson 2015-02-19 04:34:34 UTC
How very strange. I've deleted your trial so you can create another one if you like.
Comment 45 Jonathan Pryor 2015-02-19 14:14:44 UTC
@Tim:

> $ adb shell run-as aero.skydemon.skydemonandroid ls
> run-as: Package 'aero.skydemon.skydemonandroid' is not debuggable

Translation: remember those instructions I provided to get grefs.txt off the device? Well, uh, they require that the app have /manifest/application/@android:debuggable=true within AndroidManifest.xml.

Which isn't the case for the file in Comment #42.

/me cries.

Setting android:debuggable=true does *not* require a Debug build. It "just" requires explicitly providing it within your AndroidManifest.xml:

    <manifest
        ...
     >
	<application
                android:label="SkyDemon"
                android:debuggable="true"
            >
            ...

Alternatively, within AssemblyInfo.cs add:

    [assembly:Application (Debuggable=true)]

The resulting .apk should contain a debuggable app.

Also, were you able to install and run the file from Comment #42? It sounded like it didn't work for you in Comment #40...
Comment 46 Tim Dawson 2015-02-20 05:46:00 UTC
Ok, I had no idea the build had to be debuggable. You can't rely on me knowing all this stuff; if I did, I probably wouldn't be using Xamarin.

I had to roll back to a stable build but this morning I will go back to the hotfix and try another build. I never did get anything built with your hotfix to run on our development device (2012 Nexus 7). It occurs to me, though, that it might be a long-standing problem with Xamarin we've had (but not reported) where (particularly when switching from Debug to Release) the app simply will not start until it's completely uninstalled THEN re-deployed. I will try this to see if it helps, but the symptoms are slightly different so I suspect it will not.

Hopefully I'll report back in a few minutes.
Comment 47 Tim Dawson 2015-02-20 06:00:27 UTC
Interesting. The splash screen does now appear (after doing a full uninstall of the other build) but I can proceed no further. But it's a different apparent error, though, which I will paste below. I will create an APK from this and upload it for you.

02-20 10:58:49.593: A/(4757): * Assertion: should not be reached at /Users/builder/data/lanes/monodroid-mavericks-master/58d28ca9/source/mono/mono/metadata/sgen-scan-object.h:101
02-20 10:58:49.593: A/libc(4757): Fatal signal 6 (SIGABRT), code -6 in tid 4757 (skydemonandroid)
02-20 10:58:49.695: I/DEBUG(133): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
02-20 10:58:49.695: I/DEBUG(133): Build fingerprint: 'google/nakasi/grouper:5.0.2/LRX22G/1649326:user/release-keys'
02-20 10:58:49.695: I/DEBUG(133): Revision: '0'
02-20 10:58:49.695: I/DEBUG(133): ABI: 'arm'
02-20 10:58:49.695: I/DEBUG(133): pid: 4757, tid: 4757, name: skydemonandroid  >>> aero.skydemon.skydemonandroid <<<
02-20 10:58:49.695: I/DEBUG(133): signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
02-20 10:58:49.715: I/DEBUG(133): Abort message: '* Assertion: should not be reached at /Users/builder/data/lanes/monodroid-mavericks-master/58d28ca9/source/mono/mono/metadata/sgen-scan-object.h:101
02-20 10:58:49.715: I/DEBUG(133): '
02-20 10:58:49.715: I/DEBUG(133):     r0 00000000  r1 00001295  r2 00000006  r3 00000000
02-20 10:58:49.715: I/DEBUG(133):     r4 40191114  r5 00000006  r6 00000016  r7 0000010c
02-20 10:58:49.715: I/DEBUG(133):     r8 ffffffff  r9 6919acd0  sl 00000001  fp 6919acd0
02-20 10:58:49.715: I/DEBUG(133):     ip 00001295  sp bef46140  lr 400fc3c5  pc 40122648  cpsr 600f0010
02-20 10:58:49.715: I/DEBUG(133): backtrace:
02-20 10:58:49.715: I/DEBUG(133):     #00 pc 0003c648  /system/lib/libc.so (tgkill+12)
02-20 10:58:49.715: I/DEBUG(133):     #01 pc 000163c1  /system/lib/libc.so (pthread_kill+52)
02-20 10:58:49.715: I/DEBUG(133):     #02 pc 00016fdf  /system/lib/libc.so (raise+10)
02-20 10:58:49.715: I/DEBUG(133):     #03 pc 00013819  /system/lib/libc.so
02-20 10:58:49.715: I/DEBUG(133):     #04 pc 00011fe8  /system/lib/libc.so (abort+4)
02-20 10:58:49.715: I/DEBUG(133):     #05 pc 001fbca4  /data/app/aero.skydemon.skydemonandroid-1/lib/arm/libmonosgen-2.0.so
02-20 10:58:50.212: I/DEBUG(133): Tombstone written to: /data/tombstones/tombstone_00
Comment 49 Jonathan Pryor 2015-02-20 16:27:35 UTC
> Ok, I had no idea the build had to be debuggable.

I didn't either until I tried to extract files. :-(

Which brings us to the new .apk in Comment #48: it "works" for me on a Nexus 5, in that it doesn't crash. It also doesn't actually work: I hit the "Welcome to SkyDemon" splash screen, tap I Agree, then a Charts "dialog" appears with a Continue button and a spinning logo.

...and it never progresses. I've left the spinning logo spinning for over an hour, and it never stopped spinning. Tapping Continue while the logo is spinning results in the message:

> The Charts window cannot be shown at the moment because another window is open.

Uh...okay?

Fortunately, QA has a 2012 Nexus 7. Doubly fortunately, it also crashes. Unfortunately, it crashes with a *different* message: https://gist.github.com/pjcollins/90aeae493fb8fd7a5572

> I/DEBUG   (  135): Abort message: '* Assertion at /Users/builder/data/lanes/monodroid-mavericks-master/58d28ca9/source/mono/mono/metadata/class.c:8474, condition `ac->rank' not met

We are investigating...
Comment 50 Tim Dawson 2015-02-23 09:38:13 UTC
Since that build runs for you (it still doesn't for me) I will send it to one tester and ask if he can make it crash right away. If he can, I will send you the relevant log(s).

How are your investigations going?
Comment 51 Tim Dawson 2015-02-23 12:16:12 UTC
My tester is encountering the same assertion failure as you pasted above. They encounter it after pressing the Agree button before the map has had a chance to be drawn.

I'm glad that you're able to reproduce this one, and look forward to hearing your diagnosis.
Comment 52 Tim Dawson 2015-03-03 13:02:58 UTC
Ok, you've been reproducing the crash for 10 days now... are you any closer to a fix?

Tim
Comment 53 Tim Dawson 2015-03-11 12:59:46 UTC
Thank you for your email with an updated hotfix build. I have built a beta version of our app with it, and observe no crash myself (not that I have tried particularly hard) and will send it out for our testers who have been experiencing the worst stability problems to have a play with. I will report back.

Your problem with the Charts window is very strange. We've literally never seen anyone have that problem before, and we have tens of thousands of subscribers. I wonder if a network issue could be preventing the window from populating itself, but you'd think it would time out eventually.
Comment 54 Tim Dawson 2015-03-12 07:40:25 UTC
Our testers are still playing with the build from yesterday.

In the meantime I am attempting to debug the issue in the Charts window, which I have now reproduced. The problem is, I am unable to actually run a build under the debugger. When I do, as soon as I attempt to proceed past the disclaimer screen VS breaks with no source available, and the stack trace below. No other information is actually given.

>	0x0 in Android.Runtime.JNIEnv.WaitForBridgeProcessing at /Users/builder/data/lanes/monodroid-mavericks-monodroid-5.1-series/60a61668/source/monodroid/src/Mono.Android/src/Runtime/JNIEnv.cs:256,4	C#
 	0x0 in object.2a24b6ac-f51c-4a2f-9d2a-2cdc692e1ec8	C#

The DDMS output around the time of the break is below, but it doesn't seem very helpful. Any ideas?

03-12 11:37:13.271: W/chromium(29229): [WARNING:data_reduction_proxy_settings.cc(328)] SPDY proxy OFF at startup
03-12 11:37:13.290: W/art(29229): Attempt to remove local handle scope entry from IRT, ignoring
03-12 11:37:13.295: W/AwContents(29229): onDetachedFromWindow called when already detached. Ignoring
03-12 11:37:13.355: W/art(29229): JNI RegisterNativeMethods: attempt to register 0 native methods for md5f803af79f8a44689edc26e40cb6e2efe.PlateListScreen
03-12 11:37:13.414: W/art(29229): JNI RegisterNativeMethods: attempt to register 0 native methods for md56af54d060a53300a2fcdb80eed01d519.FlightDetailsScreen
03-12 11:37:13.494: E/chromium(29229): ### WebView Version 40 (1779353-arm) (code 422101)
03-12 11:37:18.425: E/WifiStateMachine(481): WifiStateMachine CMD_START_SCAN source -2 txSuccessRate=0.01 rxSuccessRate=0.07 targetRoamBSSID=00:50:7f:5e:ff:88 RSSI=-48
03-12 11:37:18.426: E/WifiStateMachine(481): WifiStateMachine starting scan for "SkyDemon-DrayTek"WPA_PSK with 2437
03-12 11:37:18.431: I/wpa_supplicant(707): wlan0: CTRL-EVENT-SCAN-STARTED 
03-12 11:37:18.442: D/WifiService(481): acquireWifiLockLocked: WifiLock{NlpWifiLock type=2 binder=android.os.BinderProxy@3fa68d33}
03-12 11:37:18.445: E/WifiStateMachine(481): WifiStateMachine CMD_START_SCAN source 10019 txSuccessRate=0.01 rxSuccessRate=0.07 targetRoamBSSID=00:50:7f:5e:ff:88 RSSI=-48
03-12 11:37:18.491: E/WifiStateMachine(481): WifiStateMachine CMD_START_SCAN source 10019 txSuccessRate=0.01 rxSuccessRate=0.07 targetRoamBSSID=00:50:7f:5e:ff:88 RSSI=-48
03-12 11:37:18.493: I/wpa_supplicant(707): wlan0: CTRL-EVENT-SCAN-STARTED 
03-12 11:37:19.756: D/WifiService(481): releaseWifiLockLocked: WifiLock{NlpWifiLock type=2 binder=android.os.BinderProxy@3fa68d33}
03-12 11:37:19.766: D/libgps(481): proxy_gps_inject_location()
03-12 11:37:19.901: W/ActivityManager(481): Launch timeout has expired, giving up wake lock!
Comment 55 Tim Dawson 2015-03-16 06:24:40 UTC
One of our testers has had a chance to try the new build and he reports it is still crashing frequently. Here's some sample logcat output:

[ 03-13 07:51:43.455 16343:16343 E/art      ]
JNI ERROR (app bug): accessed deleted global reference 0xa00b3e

[ 03-13 07:51:43.456 16343:16343 F/art      ]
art/runtime/indirect_reference_table.cc:61] JNI ERROR (app bug): see above.

[ 03-13 07:51:44.911 16343:16343 F/art      ]
art/runtime/runtime.cc:284] Runtime aborting...

[ 03-13 07:51:44.911 16343:16343 F/art      ]
art/runtime/runtime.cc:284] Aborting thread:

So it looks like the original bug that we are familiar with.
Comment 56 Jonathan Pryor 2015-03-16 09:58:21 UTC
Yay!

Now, can we get the grefs.txt file? :-)

> adb shell run-as aero.skydemon.skydemonandroid cat files/.__override__/grefs.txt > grefs.txt
Comment 57 Tim Dawson 2015-03-17 11:40:34 UTC
I'm working on it, but hampered by the issue in comment #54. Can you shed any light on that? Is this hotfix build broken for debugging by design?

I've been trying to debug the issue which you have seen, and we are also now seeing on our development device, where the Charts window appears but then nothing happens. Debugging is tricky because of the issue above but by resorting to dumping output to logcat, I have determined that our call to WebClient.OpenReadAsync never triggers the WebClient.OpenReadCompleted event which it should.

Did you break WebClient in this build?
Comment 58 Tim Dawson 2015-03-17 13:54:26 UTC
Success! I've emailed crash log and gref dump to you. Hopefully this will tell you what's going wrong.
Comment 59 Tim Dawson 2015-03-17 14:47:51 UTC
I've been looking at this a little myself. I'm no expert at this gref stuff but I can see the relevant part of the log, where things go wrong. What's worrying is that several of the stack traces shown for the creation and deletion of references are impossible.

I can see that the problematic object is disposed twice, the first time not from my method. The stack trace shows this:

   at Java.Lang.Object.Dispose()
   at Divelements.SkyDemon.Drawing.IndependentText.MeasureText(DrawingContext context, System.String text, Divelements.SkyDemon.Drawing.Font font, SizeF constraints)

But that is very suspicious as there should be an intermediate frame in there with the actual .net Dispose method. The gref log shows the object in question is a Typeface instance, and there is no Dispose called from the MeasureText method which could possibly end up Disposing a Typeface.

A little further down is this:

   at Java.Lang.Object.Dispose()
   at Divelements.SkyDemon.Drawing.DrawingContext.Dispose()
   at Divelements.SkyDemon.Drawing.IndependentText.MeasureText(System.String text, Divelements.SkyDemon.Drawing.Font font, SizeF constraint)

But again, there is no call to DrawingContext.Dispose in MeasureText (not even with a using).

Further on we have this:

   at Android.Graphics.Paint..ctor()
   at Divelements.SkyDemon.Drawing.DrawingContext..ctor(Android.Graphics.Canvas canvas)
   at Divelements.SkyDemon.Drawing.IndependentText.MeasureText(System.String text, Divelements.SkyDemon.Drawing.Font font, SizeF constraint)

But there is no creation of a DrawingContext from within the MeasureText method. These stack traces are impossible.

This looks to me like it's happening because two threads are calling into the same code at the same time, but everything is properly synchronised, or at least it should be. Because I suspect it will be of interest to you, here's the source for the MeasureText method which is surfacing itself so heavily in this log:

public static System.Drawing.Size MeasureText(DrawingContext context, string text, Font font, System.Drawing.SizeF constraints)
{
	lock (sync)
	//using (TextPaint textPaint = new TextPaint(context.Paint))
	{
		textPaint.SetTypeface(font.Typeface);
		textPaint.TextSize = font.ScaledSize;
		using (StaticLayout layout = new StaticLayout(text, textPaint, (int)constraints.Width, Layout.Alignment.AlignNormal, 1, 0, false))
		{
			int width = 0;
			for (int i = 0; i < layout.LineCount; i++)
				width = Math.Max(width, (int)Math.Ceiling(layout.GetLineWidth(i)));
			return new System.Drawing.Size(width, Math.Min(layout.Height, (int)constraints.Height));
		}
	}
}

We use a shared synchronised TextPaint in this method because it prevents us from having hundreds of TextPaint allocations during a paint cycle.
Comment 60 Tim Dawson 2015-03-23 06:23:38 UTC
It's frustrating that you've gone quiet on this again, now that we've got into a situation where we've given you the data you've been asking for. It's required quite a bit of work on my part to get you that data.

This issue was reported to you in December. Since then, although I've been chasing you every few weeks, the clamouring of our customers has not stopped, and nor would I expect it to. Can you make this a priority please?
Comment 61 Jonathan Pryor 2015-03-23 16:56:57 UTC
The problem is figuring out what the problem is. We know what it is -- an invalid handle value is used -- but we don't know *why* that is, especially when all available evidence suggests that it shouldn't be happening.

Which is aggravating for all parties involved. (It's hard to ask for more details when I can't even think of what more to ask for, yet things are still failing. I'm obviously overlooking *something*, but I don't know what.)

Thank you for Comment #59.

> What's worrying is that several of the stack traces shown for the
> creation and deletion of references are impossible.

There are three possible reasons for this:

1. Interface method invocation; or
2. Method inlining; or
3. The methods in the stack trace aren't what you think they are.

When an interface method is invoked, the interface method doesn't show up in the stack trace. Instead, the implementing method is shown.

When methods are inlined, they may not be reliably "un-inlined" when producing the stack trace. This can explain some missing methods. (None here, but theoretically speaking that can explain some missing methods...)

Finally, when reading stack traces, some care needs to be taken to ensure we're thinking about the right method overload.

Consider:

>   at Java.Lang.Object.Dispose()
>   at Divelements.SkyDemon.Drawing.IndependentText.MeasureText(DrawingContext context, System.String text, Divelements.SkyDemon.Drawing.Font font, SizeF constraints)

This is the method you provided in Comment #59, which contains:

    using (var layout = new StaticLayout (...))
    {
        ...
    }

At the end of the `using` block, layout.Dispose() is invoked, which is implemented as Java.Lang.Object.Dispose(). There isn't a "missing" method in the above stack trace; it is correct and complete.

> A little further down is this:
> 
>    at Java.Lang.Object.Dispose()
>    at Divelements.SkyDemon.Drawing.DrawingContext.Dispose()
>    at Divelements.SkyDemon.Drawing.IndependentText.MeasureText(System.String text, Divelements.SkyDemon.Drawing.Font font, SizeF constraint)
> 
> But again, there is no call to DrawingContext.Dispose in MeasureText (not even
> with a using).

What we have here is an *overload* of the MeasureText() provided in Comment #59: it's not MeasureText(DrawingContext, System.String, Font, SizeF), this is instead MeasureText(string, Font, SizeF) (note the different parameter list). This overload *does* create and Dispose() of a DrawingContext instance; the stack trace is not only not impossible; it is correct and expected.

Finally,

>    at Android.Graphics.Paint..ctor()
>    at Divelements.SkyDemon.Drawing.DrawingContext..ctor(Android.Graphics.Canvas canvas)
>    at Divelements.SkyDemon.Drawing.IndependentText.MeasureText(System.String text, Divelements.SkyDemon.Drawing.Font font, SizeF constraint)
> 
> But there is no creation of a DrawingContext from within the MeasureText
> method. These stack traces are impossible.

Again, this is a different MeasureText(string, Font, SizeF) overload, which does create a DrawingContext instance. The stack trace is correct and expected.
Comment 62 Tim Dawson 2015-03-24 06:25:42 UTC
I see what you mean about the MeasureText overload. You're right, it does create a dummy DrawingContext which itself creates a dummy Paint which is then Disposed.

The question is, why the Typeface object is being disposed twice. That's why I claimed the first stack trace above was impossible. This one:

   at Java.Lang.Object.Dispose()
   at Divelements.SkyDemon.Drawing.IndependentText.MeasureText(DrawingContext context, System.String text, Divelements.SkyDemon.Drawing.Font font, SizeF constraints)

Is from the GREF log output. It's from the first time the object with ref 0x5800a02 is disposed. Since I can scroll up and observe 0x5800a02 is a Typeface, that made me assume that the object whose Dispose method was being called was the Typeface. Is that wrong? The stack trace suggests that it must actually be a TextPaint or a StaticLayout that is being disposed. And yet your GREF output tells us that object 0x5800a02 is being cleaned up.

Further down, there's the stack trace where the Typeface SHOULD be being cleaned up:

   at Java.Lang.Object.Dispose()
   at Divelements.SkyDemon.Drawing.Font.Dispose()
   at Divelements.SkyDemon.Drawing.IndependentText.CreateFontToFitHeight(DrawingContext context, System.String fontFamily, TypefaceStyle style, Single height, System.String text)

Which is presumably where things are crashing and burning because it's already been cleaned up.

It's obvious that something to do with threading and fonts is going wrong here. We need to be rendering stuff on multiple threads, and inevitably that's going to involve fonts, and that in itself can't be so unusual. I can try scattering lots of synchronisation code around to ensure nothing with fonts happens at the same time, but there must be a better way to solve the actual problem.

Is there no debugging code we can insert which will actually tell us why the object in question is being disposed twice? Or can't you check before you dispose whether it would result in this exception?
Comment 63 Jonathan Pryor 2015-03-27 14:34:18 UTC
(Hopefully) "Fixed"/addressed in documentation/25831c6a:

http://developer.xamarin.com/guides/android/advanced_topics/garbage_collection/#Disposing_of_Peer_instances

After additional private email, we think we've figured out the issue. (I hope.)

Background: Xamarin.Android supports referential equality: if you call two Java methods and those return the same Java instance (as determined by `a == b` in Java), then calling the bindings of those same Java methods from e.g. C# should return the same wrapper instance. (We felt it would be very confusing if different instances were returned.) This is done by maintaining a mapping between Java instances and their corresponding managed wrapper instances, with a 1:1 mapping between them.

Furthermore, since each wrapper instance consumes a GREF, and GREFs have a cost associated with them, our wrapper types implement IDisposable.Dispose() to free the GREF, deleting the mapping between between the Java instance and the wrapper instance.

The problem with this is that referential equality support introduces "shared state" between threads -- by design! If two Java methods return the same instance (even across threads), and the bindings of those Java methods return the same wrapper instance (even across threads), then changes to the instance from one thread will be visible from the other thread.

...which sounds obvious, but isn't always.

Which brings us to Typeface.create(), which doesn't create new instances. Instead, it maintains a cache of previously created Typeface instances, and may return one of those previously created instances:

https://github.com/android/platform_frameworks_base/blob/117cbebe/graphics/java/android/graphics/Typeface.java#L133-165

Tim's app would call Typeface.Create(), store the Typeface, use it, then Dispose() of the instance, from multiple threads.

Since Typeface.create() didn't always create new instances, this resulted in "unintentional sharing" between those threads, and Dispose()ing of the shared Typeface instance from one thread would invalidate that instance from other threads referencing the same instance.

The "fix" (at least for now)? Don't Do That™. Specifically, don't call Dispose() on Typeface instances returned from Typeface.Create(). This advice is likely true for other methods as well, thus making things somewhat more confusing.

I've updated the documentation to hopefully make it clearer when Dispose() can be called on instances.
Comment 64 Tim Dawson 2015-03-27 14:59:29 UTC
Isn't this bug the "accessed deleted global reference" issue though, which is different?
Comment 65 Jonathan Pryor 2015-03-27 15:08:47 UTC
> Isn't this bug the "accessed deleted global reference" issue though, which is different?

I don't think so; as per the March 17 2015 email:

> > [ 03-17 17:39:51.928 27676:27676 E/art      ]
> > JNI ERROR (app bug): accessed deleted global reference 0x5800a02
> 
> So we head over to grefs.txt, looking for handle value 0x5800a02, and find:
...
> ..but why is 0x5800a02 being used?
> 
> Is there *any* possibility that the Typeface created here could be shared with another thread?
> Can you please review your source code to check that?
> 
> My only guess is that there's a "timing bug", in which one thread creates the
> Typeface instance, another thread invokes a method on that instance, and *right after*
> the handle value has been looked up but *before* JNIEnv.Call*Method() has been
> invoked the first thread Dispose()s of the Typeface instance.

As per the followup emails, I believe the above guess has been verified, at least in theory: Typeface.create() returns cached instances, and thus we can have two threads inadvertently sharing that instance, so depending on the hardware (multi-core?), it is plausible (if rare) that the Handle value would be looked up *right before* the instance is disposed on another thread, then the (CPU register cached and now invalid) handle is used.

It's the same "bug" -- inadvertent sharing between threads -- that spawns multiple failure modes: invalid JNI handles, or ArgumentExceptions, or JNIEnv::GetObjectRefType() errors, or...