Bug 29365 - Xamarin.Android signal handling is corrupted on Android 5.x devices with 4/23 update of Android System WebView
Summary: Xamarin.Android signal handling is corrupted on Android 5.x devices with 4/23...
Status: RESOLVED UPSTREAM
Alias: None
Product: Android
Classification: Xamarin
Component: Mono runtime / AOT Compiler ()
Version: 4.20.0
Hardware: PC Windows
: Highest normal
Target Milestone: ---
Assignee: Jonathan Pryor
URL:
: 28919 28985 ()
Depends on:
Blocks:
 
Reported: 2015-04-23 20:28 UTC by T.J. Purtell
Modified: 2015-08-10 14:23 UTC (History)
12 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
A minimal code sample that shows the SIGNAL being swallowed by Android System WebView (978 bytes, text/plain)
2015-04-23 20:28 UTC, T.J. Purtell
Details
Affected Version of Android System WebView screenshot (133.94 KB, image/png)
2015-04-23 20:35 UTC, T.J. Purtell
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on Developer Community or GitHub with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED UPSTREAM

Description T.J. Purtell 2015-04-23 20:28:08 UTC
Created attachment 10888 [details]
A minimal code sample that shows the SIGNAL being swallowed by Android System WebView

We started getting reports from users today about our app crashing out in places where we have ample checking for errors.  It seemed isolated to Galaxy S5 devices at first, but as the day went on we found it even started to occur on our Nexus 5 test devices.  Instead of getting a C# stack trace or a native stack trace or a debuggerd log, we get this...

> W/google-breakpad(32239): -----BEGIN BREAKPAD MICRODUMP-----
> W/google-breakpad(32239): O A arm 08 aarch64 3.10.61-4351282 #1 SMP PREEMPT Fri Mar 27 02:44:10 KST 2015
> W/google-breakpad(32239): S 0 DEAFF740 DEAFF000 00001000
> W/google-breakpad(32239): S DEAFF000 58F3F2F45076BFF39872C0F3606FCCEF00E3F5F42331FBDF29000000813400004A1801009872C0F30057C0F3000000000020CFEF00B8CEEF00000000D60B000A000000000000000000000000000000003CF4AFDEC06098DE0000000010AF50DD0000000000000000010000004A180100FB000000B46ABFF3A0EB92DE8800000008DC2FDCE889E8F3501F82D6002692D300000000001F82D6773CEADF001F82D63F000000F0FA2EDC3CF4AFDEF0BCB2F3773CEADF001F82D63F000000000000003CF4AFDEF0BCB2F3D46F10F703000000070000000000000008DC2FDCB8DC2FDC3E000002288AAAF302000000002692D3D46F10F700000000206198DED41DAEF300000000002692D3F8DB2FDC00000000C06098DE03000000386CCEF3002692D308DC2FDCE889E8F3D46F10F73B000000D46F10F700000000D46F10F719380AF70000000000ED23DDD46F10F700000000FFFFFFFF201000000010000020100000B06DC0F376010000A0FE21DC00010000002692D300000000002692D300000000
> W/google-breakpad(32239): S DEAFF180 000000000C000000F0EA2FDCF0EA2FDC2000000019380AF7000000005C58A6F300000000100000009CF3C2F38A0100000000000000000000FFFFFFFF000000000000000020902EDC000000001C8CCCF34000000000ED23DD00A693DE98F4AFDE00E723DDF8DD2FDC0000000000EA51DDA084C0F39C83C0F30300000010000000FFFFFFFFFFFFFFFF20696920000000000000000000000000753CEADF0C000000C0B0D0D3C0B0D0D3F88928DC05000000F88928DC00000000002692D3C0B0D0D30F000000E0000000F88928DC070000000400000019380AF70000000000D2BFF380B0D0D36C98A8F3F0FA2EDCF0F32EDC1C8CCCF34A000000C0FE21DC58BDC0F3C0FE2FDC70BDC0F3882792D330BDC0F3C0B0D0D30000000000000000E889E8F300000000005B9CDE0200000000000000002692D3001F82D6002692D37601000040C6F5F402000000F09ECCF3F0BCB2F3002692D37601000040C6F5F400000000F09ECCF360EC2FDC60EC2FDC3495BFF330F82EDCC06098DE00000000B0F3AFDE

After grepping all the libraries on a Galaxy S6 Edge device, we found breakpad embedded in the Android System WebView apk.  After uninstalling updates to the this "app" the crashes go away, Xamarin exception handling works correctly.

Note: occurs with 4.20.2/5.1 as long as you or some components in your app have referenced webview in a way that runs their static initializers
Comment 1 T.J. Purtell 2015-04-23 20:35:20 UTC
Created attachment 10889 [details]
Affected Version of Android System WebView screenshot

This shows the version # of the update that injects the crash handler in and squashes mono's crash handler.  If you uninstall updates the problem is 'fixed' until play store updates you again.
Comment 2 T.J. Purtell 2015-04-23 20:46:18 UTC
FYI: relevant issue regarding another fancy runtime compiler where chromium screwed things up.

https://code.google.com/p/chromium/issues/detail?id=477444
Comment 3 T.J. Purtell 2015-04-23 23:52:08 UTC
Additional issue where another developer has issues with their tools and the chromium signal handlers.  

https://code.google.com/p/chromium/issues/detail?id=476831

From the sound of it, the implementation is broken.
Comment 4 Jonathan Pryor 2015-04-24 15:17:01 UTC
As mentioned in Comment #2 and Comment #3, the problem is that the Google Play Service WebView and related types is clobbering existing SIGSEGV handlers, which screws things up for all native code which relies on signal handlers (e.g. Houdini).

As a workaround, it is possible to remove the need for a SIGSEGV handler for NullReferenceException raising, by setting the MONO_DEBUG variable to contain the value "explicit-null-checks":

http://docs.go-mono.com/?link=man%3amono(1)

This can be done by adding a file to your App project with a Build action of AndroidEnvironment:

http://developer.xamarin.com/guides/android/advanced_topics/environment/

Then adding the following line to the file:

MONO_DEBUG=explicit-null-checks
Comment 5 T.J. Purtell 2015-04-24 17:29:06 UTC
Thanks Jon, that is a helpful workaround.

I suppose in terms of getting our developers to be able to debug (debugger involves signal handling), we will just uninstall the updates to the Google system web view.
Comment 6 Jonathan Pryor 2015-04-24 18:01:51 UTC
To fix the debugger, use:

MONO_DEBUG=explicit-null-checks,soft-breakpoints

This should allow debugging apps which use WebView.
Comment 7 T.J. Purtell 2015-04-25 02:07:44 UTC
:)  Cool, that is sweet.

One interesting thing I observed is that I never created a WebView in our main project and I observed this error.  There certainly are places we use them, but they are deep in the app flows.  I suspect that some library we are using initialized some part of the WebView stack.  Could the Xamarin.Android runtime itself be initializing it?  Of note is that our app had a custom WebView class, which I have since removed as it was unnecessary.  Perhaps the type registrar causes this to initialize regardless of whether it is actually used in a particular session.
Comment 8 T.J. Purtell 2015-04-25 03:21:42 UTC
Also of interest here...

https://breakpad.appspot.com/1804002/

BreakPad developers doing some odd stuff w.r.t android signal chaining library bug on L+?

This is the crash reporting tool that seems to be involved in the intercept process based on the output logs.  

Unclear to me where in the chromium sources this is actually registered or if 100% of the real used code for the Android System WebView is actually public.
Comment 9 T.J. Purtell 2015-04-26 21:23:36 UTC
Filed a issue for chromium about the signal handling problem using a pure java/c test app.

http://code.google.com/p/chromium/issues/detail?id=481420
Comment 10 Neil Boyd 2015-04-28 09:05:03 UTC
What does explicit-null-checks do?  Are there any side effects?
Comment 11 Ben Beckley 2015-04-28 11:38:35 UTC
*** Bug 28919 has been marked as a duplicate of this bug. ***
Comment 12 Jonathan Pryor 2015-04-28 12:03:28 UTC
@Niel:

> What does explicit-null-checks do?

Normally, in "sane" environments, memory address 0x0 is not readable or writable, and when a process attempts to read or write to that address "something" happens, e.g. the SIGSEGV signal is raised on Linux, or a STATUS_ACCESS_VIOLATION exception is thrown on Windows.

By default, Mono hooks into the SIGSEGV handler, and when a SIGSEGV signal is raised, Mono will check to see if managed code was being executed. If it is, then the SIGSEGV is "swallowed" and a NullReferenceException is raised. This allows (nearly) zero overhead for null reference checking -- any overhead is only incurred if the null reference is actually used. No other checking is required.

However, on some environments this approach isn't possible, e.g. AIX can allow processes to read from address 0x0 without raising SIGSEGV, or the SIGSEGV signal handler may be replaced (as is the case here).

This is where `explicit-null-checks` comes in: instead of the default behavior of relying on the SIGSEGV handler to generate NullReferenceExceptions, all variable access is instead explicitly checked for validity before accessing the variable.

As per the documentation:

http://docs.go-mono.com/?link=man%3amono(1)

> explicit-null-checks
> Makes the JIT generate an explicit NULL check on variable dereferences instead of
> depending on the operating system to raise a SIGSEGV or another form of trap event
> when an invalid memory location is accessed.

> Are there any side effects?

Yes: explicit null checks are now used instead of "implicit" checks. This will presumably "bloat" the generated code with explicit null checks.

However, at this time we have no actual idea of what the large-scale impacts are. We don't know how much this will increase generated code size, or how much it will slow down the resulting code, and I'm not even sure it's possible to answer those questions "in general." (What would an "average" app look like?)
Comment 13 Jonathan Pryor 2015-04-28 12:12:47 UTC
@TJ: As per the activity on Bug #481420, it looks like the Chromium team is taking care of this, in that they're altering their code so that SIGSEGV is only handled for "webview" processes.

Thus, it may be plausible to close this bug as UPSTREAM...maybe.

That said, I'm slightly worried about their patch:

https://chromium.googlesource.com/chromium/src.git/+/5db4beded0f78f77207790cd9a66bf7fcbfc4fc3%5E%21/#F0

It uses `process_type` to determine behavior, and I don't know if that will have a "sane" default value; it could be that we'll still need to do "something" once this Chromium fix is released to support our default SIGSEGV-based NullReferenceException raising.
Comment 14 T.J. Purtell 2015-04-28 12:59:25 UTC
From talking with the breakpad team,I learned that chromium tries to hide the play store crash reporting popup when a renderer process fails.  So if a webpage causes an internal crash then Chrome browser on android is supposed to swallow that and report to google only Via breakpad.  I suppose this is useful because the chrome browser runs background processes to host individual pages. If the browser ui process crashes, they retained the normal behavior, eg chain to installed handler, don't use breakpad.

If I understand this correctly, it means their patch disables breakpad reporting for webview in addition to the core browser process.

The test case I provided them checks that debuggerd receives a signal, so I think that in resolving it they will resolve this issue.  Time will tell for sure I suppose.

As the interwebs is reporting, actually using a WebView is likely to cause other crashes because of apparent GPU driver bugs.  Hopefully the original app devs will see these once this fix rolls out.  I think this issue has probably stopped google from seeing the extent to which webview crashes.  So perhaps it will give them visibility into the issues too.
Comment 15 Sam Pollock 2015-04-30 04:24:41 UTC
Just a note, I have tried the fix but it seems to fail badly on a nexus 9, The app simply wont deploy.  

Hopefully a Real fix will be available soon
Comment 16 mor 2015-04-30 05:04:02 UTC
"An environment file is a Unix-formatted plain-text file with a Build action of AndroidEnvironment."

Are you sure that file is Unix-formated and Build Action is set to AndroidEnvironment? 

If I created file on Windows in Visual Studio it doesn't work, but if I created the same file in Xamarin Studio on Mac then it works on Nexus 9.
Comment 17 Sam Pollock 2015-04-30 05:28:27 UTC
Build action is AndroidEnvironment However I created the file through xamarin on mac as plain text so may not have been unix formatted, I am away from the office for a few days but will try next week, thanks for the heads up.

I was throwing no errors on the nexus 7 running kitkat though (downgraded to solve the issues I was having), only the nexus 9 with lollipop.
Comment 18 Peter Collins 2015-04-30 12:38:16 UTC
I have consistently seen an immediate SIGABRT when attempting to debug on a nexus 9 from either XS or VS with this workaround in place. I've filed this as Bug #29609, perhaps this   is the same issue you are seeing?
Comment 19 T.J. Purtell 2015-04-30 14:29:37 UTC
Android system webview beta channel should have a fix for the original issue now.  Haven't verified it personally.
Comment 20 T.J. Purtell 2015-05-01 14:22:34 UTC
https://code.google.com/p/chromium/issues/detail?id=483399

Relevant follow on fixes for webview signal propagation, FYI
Comment 21 Ben Beckley 2015-05-21 17:30:07 UTC
I am no longer experiencing the issue with Android System WebView swallowing the SIGNAL error as of the 43.0.2357.76 beta. Also, I am able to successfully debug with breakpoints without crashing while using a WebView.

Here is the patch note that pertains to this :
> Applications which use custom SIGSEGV handling logic should no longer be disrupted by WebView's breakpad crash handler.
Comment 22 Ben Beckley 2015-06-10 14:09:47 UTC
*** Bug 28985 has been marked as a duplicate of this bug. ***
Comment 23 Ben Beckley 2015-08-10 14:23:15 UTC
This has been resolved upstream by Google and the fix is in the stable version of Android System WebView. If anyone encounters this issue again, please reopen the bug.