Bug 13521 - SIGSEGV while executing native code
Summary: SIGSEGV while executing native code
Status: RESOLVED FIXED
Alias: None
Product: Android
Classification: Xamarin
Component: Mono runtime / AOT Compiler ()
Version: 4.8.x
Hardware: PC Windows
: --- normal
Target Milestone: ---
Assignee: Jonathan Pryor
URL:
Depends on:
Blocks:
 
Reported: 2013-07-26 13:24 UTC by Goncalo Oliveira
Modified: 2016-06-15 01:51 UTC (History)
4 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on Developer Community or GitHub with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Goncalo Oliveira 2013-07-26 13:24:04 UTC
Hi,

I'm having a weird crash on my app, not sure what's causing it. Here's the exception log

07-26 18:10:05.842: E/mono-rt(2519): Stacktrace:
07-26 18:10:05.855: E/mono-rt(2519):   at <unknown> <0xffffffff>
07-26 18:10:05.855: E/mono-rt(2519):   at (wrapper managed-to-native) object.wrapper_native_0x8154d635 (intptr,string) <0xffffffff>
07-26 18:10:05.856: E/mono-rt(2519):   at Android.Runtime.JNIEnv.FindClass (string) <0x0005b>
07-26 18:10:05.857: E/mono-rt(2519):   at Android.Runtime.JNIEnv.CreateInstance (string,string,Android.Runtime.JValue[]) <0x00027>
07-26 18:10:05.858: E/mono-rt(2519):   at Java.Lang.Thread/RunnableImplementor..ctor (System.Action,bool) <0x00063>
07-26 18:10:05.859: E/mono-rt(2519):   at Android.OS.Handler.Post (System.Action) <0x0002f>
07-26 18:10:05.859: E/mono-rt(2519):   at JunkyApp.SygicActivity.OnEvent (int,string) <0x002a3>
07-26 18:10:05.859: E/mono-rt(2519):   at Sygic.Sdk.Api.IApiCallbackInvoker.n_OnEvent_ILjava_lang_String_ (intptr,intptr,int,intptr) <0x0005f>
07-26 18:10:05.859: E/mono-rt(2519):   at (wrapper dynamic-method) object.3041ce90-9c5d-4cfc-a92c-5b0c0bf55bb7 (intptr,intptr,int,intptr) <0x0004b>
07-26 18:10:05.859: E/mono-rt(2519):   at (wrapper native-to-managed) object.3041ce90-9c5d-4cfc-a92c-5b0c0bf55bb7 (intptr,intptr,int,intptr) <0xffffffff>
07-26 18:10:05.859: E/mono-rt(2519): =================================================================
07-26 18:10:05.859: E/mono-rt(2519): Got a SIGSEGV while executing native code. This usually indicates
07-26 18:10:05.859: E/mono-rt(2519): a fatal error in the mono runtime or one of the native libraries
07-26 18:10:05.859: E/mono-rt(2519): used by your application.
07-26 18:10:05.859: E/mono-rt(2519): =================================================================

What's pretty strange is where this is hitting. Here's the code

handler.Post( () => Toast.MakeText( this, "null", ToastLength.Short ).Show() );


Using current stable Xamarin.Android 4.8

Don't know if it's relevant but the project references a binding library built with embedded native libraries, embedded jars and embedded reference jars.
Comment 1 Jonathan Pryor 2013-07-26 14:57:55 UTC
Dalvik's JNIEnv::FindClass() function is blowing up (or something has gone horribly wrong when we attempted to call it). Unfortunately, I can't gleam anything else from that stack trace.

What device is this with? Does this happen with the default app template? Can you provide a complete test case?

Thanks,
 - Jon
Comment 12 Goncalo Oliveira 2013-07-30 13:57:52 UTC
Weird... If I digg on ApiCallback.class found in the jar, the method signature matches

onEvent (ILjava/lang/String;)V
Comment 13 Jonathan Pryor 2013-07-30 23:04:42 UTC
The method signature is correct; that's not the problem.

I need to investigate further (read: I need to read MOAR Dalvik source), but it currently appears that Xamarin.Android may be too smart for its own good, though I don't understand why it would be breaking in this case.

Let me elaborate on "too smart for its own good". Consider a "normal" JNI function as called from C:

http://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/functions.html#wp16027
> jclass FindClass(JNIEnv *env, const char *name);

Compare to our C# binding:

http://androidapi.xamarin.com/?link=M%3aAndroid.Runtime.JNIEnv.FindClass(System.String)
> public static IntPtr FindClass(string name);

Notice that the `JNIEnv *env` parameter is missing. To "simplify" things, we instead provide a JNIEnv.Handle static property which is used everywhere for the `JNIEnv*` value. JNIEnv.Handle is a per-thread value ([ThreadStatic]), and whenever a new thread starts executing we get the appropriate JNIEnv* value from JNIInvokeInterface::GetEnv().

http://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/invocation.html#GetEnv

In this fashion nobody needs to care about the exact JNIEnv* value, as every thread will always have a known-valid value.

What's _possibly_ happening (entirely untested and unverified, which is why I need to read the Dalvik source) is that Dalvik may have the concept of JNIEnv*-specific object references, in which a jobject handle can only be used with the "owning" JNIEnv* value.

This conjecture makes NO sense; it means that an instance couldn't be used across threads in a multi-threaded manner.

Just because that doesn't make any sense doesn't invalidate it. :-/
Comment 14 Goncalo Oliveira 2013-07-31 06:13:37 UTC
In theory, there's nothing contrary to creating an instance in one thread and using it on another thread. However, all instances begin as local references, thus, it's necessary to turn these local references into global references. Something like

   jclass lObj = env->FindClass( "MyClass" );
   jclass gObj = env->NewGlobalRef( lObj );

At least that's what Android documentation says

Every argument passed to a native method, and almost every object returned by a JNI function is a "local reference". This means that it's valid for the duration of the current native method in the current thread. Even if the object itself continues to live on after the native method returns, the reference is not valid.

http://developer.android.com/training/articles/perf-jni.html#local_and_global_references

But I'm assuming that is already a given...
Comment 15 Goncalo Oliveira 2013-08-01 06:10:06 UTC
Damn... this is hitting me on other applications. I'm going to need to revert to 4.6.0. Not on a third-party library this time.

08-01 11:13:08.990: E/mono-rt(3015): Stacktrace:
08-01 11:13:08.990: E/mono-rt(3015):   at <unknown> <0xffffffff>
08-01 11:13:08.990: E/mono-rt(3015):   at (wrapper managed-to-native) object.wrapper_native_0xaca44f59 (intptr,intptr,intptr) <IL 0x00027, 0xffffffff>
08-01 11:13:08.990: E/mono-rt(3015):   at Android.Runtime.JNIEnv.CallObjectMethod (intptr,intptr) [0x00005] in /Users/builder/data/lanes/monodroid-mlion-monodroid-4.8.0-branch/3f1c339b/source/monodroid/src/Mono.Android/src/Runtime/JNIEnv.g.cs:129
08-01 11:13:08.990: E/mono-rt(3015):   at Android.Net.UriInvoker.ToString () [0x0002d] in /Users/builder/data/lanes/monodroid-mlion-monodroid-4.8.0-branch/3f1c339b/source/monodroid/src/Mono.Android/platforms/android-14/src/generated/Android.Net.Uri.cs:1392
08-01 11:13:08.990: E/mono-rt(3015):   at (wrapper runtime-invoke) <Module>.runtime_invoke_object__this__ (object,intptr,intptr,intptr) <IL 0x00050, 0xffffffff>
08-01 11:13:08.990: E/mono-rt(3015):   at <unknown> <0xffffffff>
08-01 11:13:08.990: E/mono-rt(3015):   at FMobile.Views.Forms.MediaPickerActivity/PictureInfoAdapter.OnCreateView (Android.Views.View,FMobile.Views.Forms.PictureInfo,Android.Views.ViewGroup) [0x00025] in c:\Projects\trunk\FrotcomMobile\Android\FMobile\Views\Forms\MediaPickerActivity.cs:340
08-01 11:13:08.990: E/mono-rt(3015):   at FMobile.Adapters.CustomAdapter`1.GetView (int,Android.Views.View,Android.Views.ViewGroup) [0x0002f] in c:\Projects\trunk\FrotcomMobile\Android\FMobile\Adapters\CustomAdapter.cs:90
08-01 11:13:08.990: E/mono-rt(3015):   at Android.Widget.BaseAdapter.n_GetView_ILandroid_view_View_Landroid_view_ViewGroup_ (intptr,intptr,int,intptr,intptr) [0x00019] in /Users/builder/data/lanes/monodroid-mlion-monodroid-4.8.0-branch/3f1c339b/source/monodroid/src/Mono.Android/platforms/android-14/src/generated/Android.Widget.BaseAdapter.cs:454
08-01 11:13:08.990: E/mono-rt(3015):   at (wrapper dynamic-method) object.b4dd859c-5c27-4cba-8d08-56fdf0af81a6 (intptr,intptr,int,intptr,intptr) <IL 0x00023, 0x0005f>
08-01 11:13:08.990: E/mono-rt(3015):   at (wrapper native-to-managed) object.b4dd859c-5c27-4cba-8d08-56fdf0af81a6 (intptr,intptr,int,intptr,intptr) <IL 0x00028, 0xffffffff>
08-01 11:13:08.990: E/mono-rt(3015): =================================================================
08-01 11:13:08.990: E/mono-rt(3015): Got a SIGSEGV while executing native code. This usually indicates
08-01 11:13:08.990: E/mono-rt(3015): a fatal error in the mono runtime or one of the native libraries 
08-01 11:13:08.990: E/mono-rt(3015): used by your application.
08-01 11:13:08.990: E/mono-rt(3015): =================================================================
Comment 16 Goncalo Oliveira 2013-08-01 07:13:40 UTC
Update: the previous exception (on other application) might not be related, as it still occurs with the previous stable 4.6.08

08-01 12:18:57.726: I/mono(1272): Stacktrace:
08-01 12:18:57.727: I/mono(1272):   at Android.Runtime.JNIEnv.NewString (string) <0x0006b>
08-01 12:18:57.727: I/mono(1272):   at Android.Util.Log.Warn (string,string) <0x0008b>
08-01 12:18:57.727: I/mono(1272):   at JunkyApp.SygicActivity.OnEvent (int,string) <0x00073>
08-01 12:18:57.727: I/mono(1272):   at Sygic.Sdk.Api.IApiCallbackInvoker.n_OnEvent_ILjava_lang_String_ (intptr,intptr,int,intptr) <0x0005f>
08-01 12:18:57.727: I/mono(1272):   at (wrapper dynamic-method) object.4afd9d1b-56d6-4501-aef5-12edcfc731f6 (intptr,intptr,int,intptr) <0x0004b>
08-01 12:18:57.727: I/mono(1272):   at (wrapper native-to-managed) object.4afd9d1b-56d6-4501-aef5-12edcfc731f6 (intptr,intptr,int,intptr) <0xffffffff>
08-01 12:18:57.807: E/mono(1272): Unhandled Exception:
08-01 12:18:57.807: E/mono(1272): System.NullReferenceException: Object reference not set to an instance of an object
08-01 12:18:57.807: E/mono(1272): at Android.Runtime.JNIEnv.ExceptionOccurred () <0x00037>
08-01 12:18:57.807: E/mono(1272): at Android.Runtime.AndroidEnvironment.GetExceptionForLastThrowable () <0x0000b>
08-01 12:18:57.807: E/mono(1272): at Android.Runtime.JNIEnv.GetStaticMethodID (intptr,string,string) <0x0005f>
08-01 12:18:57.807: E/mono(1272): at Android.Util.Log.Info (string,string) <0x0006f>
08-01 12:18:57.807: E/mono(1272): at Android.Runtime.AndroidEnvironment.UnhandledException (System.Exception) <0x00047>
08-01 12:18:57.807: E/mono(1272): at (wrapper dynamic-method) object.4afd9d1b-56d6-4501-aef5-12edcfc731f6 (intptr,intptr,int,intptr) <0x0009f>
08-01 12:18:57.807: E/mono(1272): at (wrapper native-to-managed) object.4afd9d1b-56d6-4501-aef5-12edcfc731f6 (intptr,intptr,int,intptr) <0x00057>
08-01 12:18:57.808: I/mono(1272): [ERROR] FATAL UNHANDLED EXCEPTION: System.NullReferenceException: Object reference not set to an instance of an object
08-01 12:18:57.808: I/mono(1272): at Android.Runtime.JNIEnv.ExceptionOccurred () <0x00037>
08-01 12:18:57.808: I/mono(1272): at Android.Runtime.AndroidEnvironment.GetExceptionForLastThrowable () <0x0000b>
08-01 12:18:57.808: I/mono(1272): at Android.Runtime.JNIEnv.GetStaticMethodID (intptr,string,string) <0x0005f>
08-01 12:18:57.808: I/mono(1272): at Android.Util.Log.Info (string,string) <0x0006f>
08-01 12:18:57.808: I/mono(1272): at Android.Runtime.AndroidEnvironment.UnhandledException (System.Exception) <0x00047>
08-01 12:18:57.808: I/mono(1272): at (wrapper dynamic-method) object.4afd9d1b-56d6-4501-aef5-12edcfc731f6 (intptr,intptr,int,intptr) <0x0009f>
08-01 12:18:57.808: I/mono(1272): at (wrapper native-to-managed) object.4afd9d1b-56d6-4501-aef5-12edcfc731f6 (intptr,intptr,int,intptr) <0x00057>
08-01 12:18:59.370: I/ActivityThread(1301): Pub com.frotcom.junkyapp.mono.MonoRuntimeProvider.__mono_init__: mono.MonoRuntimeProvider
Comment 17 Jonathan Pryor 2013-08-01 11:42:32 UTC
I have a hypothesis.

I still can't create a repro which would confirm or deny the hypothesis. :-(

As noted earlier, things crash when the n_* method is invoked with a `jnienv` parameter
value which doesn't match the JNIEnv.Handle value, wherein JNIEnv.Handle is a thread-specific value.

What _appears_ to be happening is:

1. A thread is created.
2. Execution of (1) enters XA, JNIEnv.Handle is set via JNIInvokeInterface::AttachCurrentThread(), then JNIEnv* is cached in TLS data.
3. The thread (1) is disassociated with Dalvik "as if" via JNIInvokeInterface::DetachCurrentThread() and the JNIEnv* parameter for thread (1) is invalidated.
4. The thread (1) calls JNIInvokeInterface::AttachCurrentThread(), and a new & different JNIEnv* value is created.
5. The thread (1) re-enters XA, and XA tries to use the (now invalid) JNIEnv.Handle value.
6. We crash.

I've done lots of dirty patches to JNIEnv._monodroid_get_identity_hash_code() to poke at private Dalvik data; Dalvik's JNIEnvEx structure has a `self` pointer (which refers to the current Dalvik Thread*), and right before the process crashes -- when the `jnienv` pointer and JNIEnv.Handle differ -- the JNIEnv.Handle's JNIEnvEx::self pointer is NULL, which certainly explains why it breaks.

The problem is that I can't manually provoke (3) and (4); when I try to do the obvious repro (create a new thread which does AttachCurrentThread(), DetachCurrentThread(), AttachCurrentThread()), the JNIEnv* pointer is unchanged, so I can't tell if this is what's actually happening or if something else is involved.

Further troubling is that the native library is not invoking DetachCurrentThread() itself, so I don't know why the JNIEnv* pointer is being changed for the thread at all; the JNIEnv* value seems to be constant for the lifetime of the associated Thread*, and the only obvious spot to cleanup the Thread* is dvmDetachCurrentThread() (unless I'm missing something, which I very probably am...).
Comment 18 Jonathan Pryor 2013-08-01 21:39:30 UTC
I am now able to provoke (3) and (4); it just requires that the worker thread from (1) pause execution after calling DetachCurrentThread() while a Java-side GC is performed from the main thread. Once the GC is complete, when the worker thread calls AttachCurrentThread(), the JNIEnv* value is different.

Which means I now have a minimal test case which reproduces the situation of the SIGSEGV: two different JNIEnv* values for the same thread.

(What it doesn't do is give me a NULL pointer for JNIEnvEx::self, but hopefully the "two different JNIEnv* values for the same thread" part is the important bit. It's also unfortunate that I need to explicitly use DetachCurrentThread() instead of some other mechanism, but it works...)
Comment 19 Jonathan Pryor 2013-08-01 22:59:31 UTC
The scenario in Comment #17 and Comment #18 is fixed in monodroid/18545560.

Note that this fix will require that you re-generate your binding assembly. There will be a new Java.Lang.Object.GetObject<T>(IntPtr, IntPtr, JniHandleOwnership) method that the n_*() methods will use, and the binding assemblies will need to be regenerated to use this method.

Please file a separate bug with a test case for Comment #15; at a guess, I think Dalvik is aborting the process (which is frequently the case with NullReferenceExceptions):

http://docs.xamarin.com/guides/android/troubleshooting#15.unexpected-nullreferenceexceptions
Comment 20 Goncalo Oliveira 2013-08-02 07:03:15 UTC
Awesome!

Jon, that's good news. Now it's just a matter of when this will available on stable channel. I'll stick to 4.6.0 until then.

Regarding Comment #15 I'll open up a new bug, but it'll take me some time to prepare a test case.

Thanks.
Comment 21 Jonathan Pryor 2013-08-02 10:04:43 UTC
I don't know why 4.6.0 would be any better in this respect, and I know I someone reported a virtually identical issue with 4.6.x (the forums thread you can't read), which has the same fundamental "two different JNIEnv* pointer values for the same thread" scenario.
Comment 22 Goncalo Oliveira 2013-08-02 10:51:05 UTC
Jon,

Regarding this particular issue, namely Comment #17 and Comment #18, it's not. I have the same problem with both versions. But the issue in Comment #15, which is now reported in bug 13707, only happens with 4.8, with 4.6.0 it's not a problem, that's why I have to stay on 4.6.0.
Comment 24 Jonathan Pryor 2013-10-18 10:51:04 UTC
The need to regenerate binding assemblies is now explicitly called out in the XA 4.8.2 release notes:

http://docs.xamarin.com/releases/android/xamarin.android_4/xamarin.android_4.8#Xamarin.Android_4.8.2
Comment 25 John Hardman 2016-06-14 12:53:10 UTC
Using XF 2.2.0.45, I am seeing something very similar. I've put a big comment showing the stacktrace in the following piece of code where the exception is caught. This only seems to be a problem on Android (I have tested on WinRT, UWP and iOS as well).

                                Task.Factory.StartNew(() =>
                                {
                                    try
                                    {
                                        // if the navigated event has not fired, we have timed out
                                        if ((!_pageDisappeared) && (!_navigationComplete))
                                        {
                                            // UI interactions need to be on the main thread
                                            Device.BeginInvokeOnMainThread(async () =>
                                            {
                                                try
                                                {
                                                    if ((!_pageDisappeared) && (!_navigationComplete))
                                                    {
                                                        // hide the status label
                                                        _statusLabel.IsVisible = false;

                                                        // tell the user there has been a timeout,
                                                        // then wait for the user to ack the message
                                                        await DisplayAlert("Web Page",
                                                            String.Format("Timed out opening web page\r\n{0}", _url),
                                                            "Ok")
                                                            .ConfigureAwait(true);

                                                        // we use an exit gate to ensure only one piece 
                                                        // of code pops from the navigation stack
                                                        await PopThisPageAsync();
                                                    }
                                                }
                                                catch (Exception ex3)
                                                {
                                                    var v = ex3;
                                                }

                                            }); // Device.BeginInvokeOnMainThread(async () =>

                                        } // if (!_navigationComplete)
                                    }
                                    catch (System.NullReferenceException nre)
                                    {
                                        if (Device.OS == TargetPlatform.Android)
                                        {
                                            // The following exception is sometimes thrown on Android
                                            // Object reference not set to an instance of an object at Android.Runtime.AndroidEnvironment.GetExceptionForLastThrowable()[0x00000]
                                            // in /Users/builder/data/lanes/3053/a94a03b5/source/monodroid/src/Mono.Android/src/Runtime/AndroidEnvironment.cs:61 
                                            // Android.Runtime.JNIEnv.CallVoidMethod (IntPtr jobject, IntPtr jmethod, Android.Runtime.JValue* parms) [0x00057] in /Users/builder/data/lanes/3053/a94a03b5/source/monodroid/src/Mono.Android/src/Runtime/JNIEnv.g.cs:569 
                                            // Android.App.Activity.RunOnUiThread (IRunnable action) [0x00044] in /Users/builder/data/lanes/3053/a94a03b5/source/monodroid/src/Mono.Android/platforms/android-23/src/generated/Android.App.Activity.cs:5727 
                                            // Android.App.Activity.RunOnUiThread (System.Action action) [0x00000] in /Users/builder/data/lanes/3053/a94a03b5/source/monodroid/src/Mono.Android/src/Android.App/Activity.cs:23 
                                            // Xamarin.Forms.Forms+AndroidPlatformServices.BeginInvokeOnMainThread (System.Action action) [0x0000e] in C:\BuildAgent2\work\aad494dc9bc9783\Xamarin.Forms.Platform.Android\Forms.cs:247 
                                            // Xamarin.Forms.Device.BeginInvokeOnMainThread (System.Action action) [0x00000] in C:\BuildAgent2\work\aad494dc9bc9783\Xamarin.Forms.Core\Device.cs:48 
                                        }
                                        else
                                        {
                                            var v = nre; // just to help with debugging
                                        }
                                    }
                                    catch (Exception ex)
                                    {
                                        var v = ex;  // just to help with debugging
                                    }

                                }); // Task.Factory.StartNew(() =>
Comment 26 Jonathan Pryor 2016-06-15 01:51:37 UTC
@John Hardman: That's most certainly a different but. The original bug was about a SIGSEGV (process crash) while managed code executed when managed code is called by Java code, as signified by the presence of a stack frame with an `n_`-prefixed method.

That is not the case in Comment #25; if it *were* the case, you couldn't catch the NullReferenceException, as your process would be gone.

Please file a new bug and provide a repro.