Bug 4120 - Crash when using TestFlight API and SOAP
Summary: Crash when using TestFlight API and SOAP
Status: RESOLVED ANSWERED
Alias: None
Product: iOS
Classification: Xamarin
Component: General ()
Version: 5.2
Hardware: Macintosh Mac OS
: --- enhancement
Target Milestone: Untriaged
Assignee: Bugzilla
URL:
: 9129 ()
Depends on:
Blocks:
 
Reported: 2012-03-28 05:41 UTC by Maxim
Modified: 2016-02-15 14:35 UTC (History)
5 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on Developer Community or GitHub with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED ANSWERED

Description Maxim 2012-03-28 05:41:21 UTC
Hi!

I integrate TestFlight API into app, it builds with succes, but now it crashed any time when call SOAP method.

I tried https://github.com/anujb/monotouch-bindings/tree/master/TestFlight - download it, build dll, link it to app, call TestFlight.TakeOff, then TestFlight.PassCheckpoint. Messages comes to SDK Debugger, but not to 

I tried also https://github.com/ayoung/monotouch-testflight with more difficult instruction. Absolutely the same results. Of course, I added mtouch params -nosymbolstrip -nostrip -cxx -gcc_flags "-lgcc_eh -L${ProjectDir} -ltestflight -ObjC".

Testing on emulator or device doesn't change anything.

Please help me.
Comment 1 Maxim 2012-03-28 06:25:43 UTC
I think that problem is that TestFlight class is used in static class, but after move it into different class, it doesn't change behavior.
Comment 2 Rolf Bjarne Kvinge [MSFT] 2012-03-28 08:21:01 UTC
The problem is likely that a NullReferenceException is thrown in the base class libraries when you call a SOAP method (and then caught by an try-catch clause).

Unfortunately TestFlight doesn't let mono handle NullReferenceExceptions at all, they're treated automatically as crashes. The only short-term fix is to disable crash reporting with TestFlight.

We're working on a crash reporting solution that doesn't suffer from this shortcoming, and it will be released together with a later MonoTouch release (exactly which hasn't been decided yet).
Comment 3 Maxim 2012-03-28 08:57:43 UTC
Please make some things clear.
1. Do I understand right, that TestFlight override app default NullReferenceExceptions with it's own and on any such exception it forcibly?
It's strange cause SOAP methods didn't throw NullReferenceException at the past.

2. Is it possible to disable crash reporting, but stay feedback form and checkpoints api available?
Comment 4 Rolf Bjarne Kvinge [MSFT] 2012-03-28 10:26:25 UTC
1. Yes. It is not strange, because internal processing in the base class libraries to handle SOAP methods can throw NullReferenceExceptions you never see (if they're caught before you see them).

2. I believe so, but you'll have to check TestFlight documentation to verify it.
Comment 5 Maxim 2012-03-28 11:42:30 UTC
OK!

If I understand you correct:
1. Now it is impossible to use both SOAP services and TestFlight API wrapper in one app;
2. Your team working under solution of that issue.
Comment 6 Rolf Bjarne Kvinge [MSFT] 2012-03-28 19:09:56 UTC
1. Correct.
2. Correct.
Comment 7 Maxim 2012-03-29 05:48:52 UTC
Thanks for your explanation.

Could you name date of release solution? Maybe a week, or a month. It is important task.
Comment 8 Rolf Bjarne Kvinge [MSFT] 2012-03-29 12:54:56 UTC
Unfortunately we don't have a release date yet - but it'll likely be something like a couple of months.
Comment 9 Clay Fowler 2012-04-25 12:26:13 UTC
TestFlight says there is currently no way to disable TestFlight's exception handling, so I guess there's no workaround - we will have to wait for the change on the MonoTouch side. Here's what TestFlight support told me:

"APR 25, 2012  |  11:59AM EDT 
Jason replied:
Hi Clay,

At this time, there isn't a way to disable crash handling in TestFlight. Sorry about that. I'll mention this to the team to consider for the future.

Jason Rehmus
TestFlight Support"
Comment 10 Maxim 2012-04-25 16:18:23 UTC
Same here. I receive the such response too.

Since the deadlines to solve this issue from your side are unknown, I added my own alternative to TestFlight in iPhone, Android phones and now iPad version of app:
1. On any button click and such user interaction app writes message to log;
2. On AppDomain.CurrentDomain.UnhandledException handler app save latest 200 events to app's user data folder before it will crash;
3. On next launch it check for crash log. If it exists, ask user for sending email with this log as attachment;
3.1. If yes, default email draft window appears with beforehand filled subject, body and attachment. User only should to click "Send" button;
3.2. Said that mail agent is not set upped if so.
Comment 11 Dan Abramov 2013-01-11 13:37:32 UTC
Still no progress on this one. :-(
TestFlight SDK docs say[1]:

>If you do use uncaught exception or signal handlers install your handlers before calling takeOff. Our SDK will then call your handler while ours is running.

Why doesn't it help?

I'm also curious if changing anything in PLCrashReporter (used by HockeyApp) can help[2].

[1]: https://testflightapp.com/sdk/doc/
[2]: http://code.google.com/p/plcrashreporter/source/browse/Source/PLCrashSignalHandler.m
Comment 12 Landon Fuller 2013-01-11 14:12:03 UTC
Primary maintainer of PLCrashReporter here. I was directed here from a user report about the same problem.

We'd be happy to coordinate some sort of generic fix with MonoTouch (preferably that would also work for anyone else that requires this sort of signal handler).

Off the top of my head, the issue that we face is that to generate an accurate crash report, we need the original signal information, including ucontext_t. Given that, passing the signal off to an existing signal handler won't work for our purposes, as there's no standardized way for that handler to let us know that the signal was handled, and thus, that we shouldn't write out a report and terminate the process.

It seems like MonoTouch is better suited to handle this, by overriding *our* signal handlers, and calling out to them in the case that the signal can't be handled internally, using the preserved ucontext et al. That would require that our handlers either be installed first (and then MonoTouch overrides them), or using some other mechanism to register our handlers with the MonoTouch runtime.

Either way, I'm open to suggestions on what we can do from our end (PLCrashReporter, and various libraries based on PLCrashReporter), even if it's just adding documentation on the subject.
Comment 13 Landon Fuller 2013-01-11 14:24:01 UTC
Oh, I almost forgot:

This is also going to be a problem with crash reporters that make use of mach exception handling, rather than signal handlers. As far as I'm aware there's no way (from a task-registered exception handler) to determine whether a mach exception will be (or, after the fact, has been) handled by a signal handler.

In that case, you'll see those reporters writing out spurious reports on each NULL dereference (or other handleable errors), but otherwise allowing Mono to function. This isn't really an acceptable solution, as you're paying for a bunch of stack walking and disk I/O every time a non-fatal signal is handled by the runtime. Back in Mac OS X 10.5 era (IIRC), this was significant issue with porting the JVM's signal handling code, and I had to disable Apple's crash reporter ports entirely until the OS shipped with a fix (moving the crash reporter handling to occur -after- the signal handler was executed).

The fix here would be the same for signal handlers; Mono could override the crash reporter's exception ports and then call out to the original exception servers in the case that it can't be handled. Unfortunately, as far as I'm aware the Mach exception stuff remains private on iOS, so it may just be that there's no safe way to resolve this side of the coin that doesn't risk rejection, etc. If anyone on the Mono side has any insight into this, it would also be most welcome even outside the question of crash reporting, as we're looking into the iOS Mach exception issue in general.
Comment 14 Rolf Bjarne Kvinge [MSFT] 2013-01-11 15:31:07 UTC
Landon,

What you can do is to make the list of fatal signals configurable (as I mentioned here: https://bugzilla.xamarin.com/show_bug.cgi?id=9129#c1) - and the user can then disable the SIGSEGV handler. Mono will raise a SIGABRT if a SIGSEGV is really a crash, so this should work out just fine.

I do not know enough about Mach exception handling to think of a solution (if there is one) - but the obvious option would be to also make it optional (or at least opt-out).
Comment 15 Rolf Bjarne Kvinge [MSFT] 2013-01-11 15:41:18 UTC
*** Bug 9129 has been marked as a duplicate of this bug. ***
Comment 16 Landon Fuller 2013-01-11 16:26:20 UTC
Howdy Rolf, 

Doesn't look like I can access 9129, but I think I get the gist. The issue I see with that approach is that the ucontext et al will be permuted by throwing a SIGABRT, such that we won't be able to write out a report that reflects the original failure condition.
Comment 17 Dan Abramov 2013-01-11 16:28:43 UTC
Landon, I marked 9129 as private when I reported it.
Basically, it covers the same issue, with this reply from Rolf:

===
This is a known issue with all crash reporters.

A null reference exception is actually a SIGSEGV signal at first. Usually the
mono runtime handles this and translates it into a nullreference exception,
allowing the execution to continue. The problem is that SIGSEGV signals are a
very bad thing in ObjC apps (and when it occurs outside of managed code), so
any crash reporting solution will report it as a crash (and kill the app) -
this happens before MonoTouch gets a chance to handle the SIGSEGV, so there is
nothing MonoTouch can do about this.

One possible solution is to allow mono to handle all SIGSEGV signals
(technically speaking the crash reporting lib should either not handle the
SIGSEGV signal, or it should chain to mono's handler and not do any processing
by itself). If mono determines that the SIGSEGV signal is not from managed code
(i.e. something very bad happened), it will raise a SIGABORT signal (which the
crash reporting lib should already handle and treat as a crash). As you can
understand this is something that has to be done in the crash reporting
library.

Note that TestFlight has the same issue, it's not limited to HockeyApp.
===
Comment 18 Dan Abramov 2013-01-11 16:32:54 UTC
Landon,

>The issue I see with that approach is that the ucontext et al will be permuted by throwing
>a SIGABRT, such that we won't be able to write out a report that reflects the
>original failure condition.

Do I understand correctly that you mean the case when SIGSEGV was not caused by managed code and actually needs to be reported?
Comment 19 Landon Fuller 2013-01-11 16:42:53 UTC
Dan,

> Do I understand correctly that you mean the case when SIGSEGV was not caused by
> managed code and actually needs to be reported?

That's correct. I'm not really familiar with Mono's AOT compilation and where a crash reporter might be useful there, but if:

- The only possible cause of a crash is a bug in Mono itself, and
- All other code is written in C#, and 
- The crash can be handled internally to Mono's managed runtime without needing any sort of external native crash reporting

... then I'd say that the crash reporter should simply be disabled entirely, and one can skip this whole mess. 

However, if one is mixing native/mono code and wants crash reporting for *that*, then this is something we'd still need to handle with some sort of coordination between crash reporters and mono, or by mono shipping their own compatible reporter.
Comment 20 Dan Abramov 2013-01-11 16:49:03 UTC
Landon,

From my experience you often want to reuse Objective C libraries and Mono provides a mechanism for that (in fact, they're using the same tools to bind Apple APIs including UIKit) [0]. So it's rarely pure C# for real-life MonoTouch projects, I'm sure.

If you consider that many crashes result from native code accessing native objects that have been destroyed because you let their managed wrappers get collected by Mono GC, I'm pretty sure reporting these exceptions is very important.

[0]: http://docs.xamarin.com/ios/Guides/Advanced_Topics/Binding_Objective-C_Libraries
Comment 21 Rolf Bjarne Kvinge [MSFT] 2013-01-11 18:29:46 UTC
Landon, in my experience it's very hard to automatically provide extra diagnostics for the crashes that occur in MonoTouch apps (probably because you don't get any of what would be easy crashes in ObjC, those are handled entirely in managed code, you only get the hard (and worse) crashes). The point I'm trying to make is that I don't think it matters if the ucontext isn't from the original signal.
Comment 22 Landon Fuller 2013-01-12 11:36:49 UTC
> Landon, in my experience it's very hard to automatically provide extra
> diagnostics for the crashes that occur in MonoTouch apps (probably because you
> don't get any of what would be easy crashes in ObjC, those are handled entirely
> in managed code, you only get the hard (and worse) crashes). The point I'm
> trying to make is that I don't think it matters if the ucontext isn't from the
> original signal.

One of the guarantees we're obligated to provide is that the reporting will be an
accurate representation of thread/process state, even if the crashes are of the
'hard' variety. We can't really make judgements as to what is or is not resolvable
by users of the library.

If there were hooks to provide more details from Mono to assist in those hard
cases, that might be an interesting tangent!

That said, it sounds from user reports (including Dan's above) that the concern
is primarily with getting accurate reports from non-mono code (native C/ObjC
libraries), in which case accurate reporting is still imperative. I'm not sure if
there's a place where we can meet in the middle, but it'd be nice to have
PLCrashReporter (and the myriad of paid services based on it) providing accurate
reporting out of the box with Mono.
Comment 23 Dan Abramov 2013-01-12 11:53:32 UTC
Landon,

Can you please tell me what is ucontext and how is it useful in understanding crashes? As MonoTouch user, all I ever needed from crash reports were stack traces and class names in "unrecognized selector sent to instance" messages. I never needed anything else for diagnostics yet.

I'm sure MonoTouch customers would have been more than happy if they got their managed exceptions and some unmanaged crashes had incorrect details. In my experience, about 90% of crashes in our app were from managed exceptions, and about 7% from "unrecognized selector" problems due to objects eaten by GC. The rest 3% are assertion failures in UIKit and broken bindings.

Please try to understand our our pain. When you're rushing towards a release and you realize that crash reporting solution that you pay for actually crashes your app, you don't care about 100% technical correctness in reports regarding the information that is very unlikely to be of use. I understand your concerns but it's hard for me to understand how slightly incorrect reporting for a minor detail is better than silent crashes. I'm sure most MonoTouch customers would agree with me on this one.
Comment 24 Landon Fuller 2013-01-12 12:37:39 UTC
Dan,

I should point out first that I'm just the author of the underlying iOS crash reporting library that most services use, and I don't get paid much for that honor. Actually, I generally don't get paid anything :) I just don't want to misrepresent myself as speaking for HockeyApp or any of the paid services that rely on PLCrashReporter.

For an immediate short-term fix for your problem, it would be very short work to provide code that saves the current SIGSEGV/SIGBUS signal handlers and then restores them after the initialization of the crash reporter. The reports, as noted, will not be accurate, but it sounds like it will cover your primary requirements. If you can handle this internally or via  request to HockeyApp, it wouldn't take long. Otherwise my organization is available for contracting, but feel free to e-mail me offline.

My main aim here is to find a way to coordinate behavior across crash reporters and Mono to produce technically correct reports, preferably in a way that can be used by other crash reporting libraries and managed languages with similar signal handling requirements, without breaking Mono's ability to handle resolvable managed runtime errors.

I've expanded some of the reasoning/requirements behind PLCrashReporter's approach below, both in answer to your questions, as well as to (hopefully) help in moving forward the dialog on where we can meet in the middle with Xamarin.

> Can you please tell me what is ucontext and how is it useful in understanding
> crashes? As MonoTouch user, all I ever needed from crash reports were stack
> traces and class names in "unrecognized selector sent to instance" messages. I
> never needed anything else for diagnostics yet.

We need a valid ucontext to walk the stack of the crashed thread, as well as to determine the register state at the time of the crash. We also need the original signal and si_addr/si_code values, as these are necessary to determine the type and cause of crash. For example, a SIGSEGV with a si_addr of 0x4 would imply a NULL pointer dereference with a structure member offset of 0x0. There are more complex and necessary use-cases, one of which I'll expand on below.

> Please try to understand our our pain. When you're rushing towards a release
> and you realize that crash reporting solution that you pay for actually crashes
> your app, you don't care about 100% technical correctness in reports regarding
> the information that is very unlikely to be of use. I understand your concerns
> but it's hard for me to understand how slightly incorrect reporting for a minor
> detail is better than silent crashes. I'm sure most MonoTouch customers would
> agree with me on this one.

To elucidate on why it's a requirement in PLCrashReporter and other crash reporters to provide full/accurate crash state; speaking as an application developer as well as the author of plcrashreporter, we rely on complete/accurate crash reports to decipher more difficult issues in which thread state may be the only reasonable mechanism to determine the cause of a crash.

For example, a crash in objc_msgSend() -- such as caused by objects being eaten by GC as you've described above -- can leave little valuable information in the thread stack. You *may* get an "unrecognized selector" exception if that particular piece of RAM has been re-allocated for a different object, but an outright crash is just as likely. In that case, the only way to decipher the selector that was called is by investigating the register state, and on ARM, correlating the address in r2 to an Obj-C selector (SEL) address within a binary. Additionally, the objc_msgSend trampoline does not update the stack pointer, such that a naive stack walker will skip right over whatever method called objc_msgSend(). The only way to resolve this is with access to register state, such that one can access the stack/register containing the actual return address.

Without register state, the above will often manifest with a fairly useless stack trace, as the method/function that called objc_msgSend() will be entirely lost.
Comment 25 Landon Fuller 2013-01-12 12:43:32 UTC
Oops. s/stack pointer/frame pointer/ in the objc_msgSend paragraph.
Comment 26 Dan Abramov 2013-01-12 12:54:57 UTC
Landon,

I'm sorry if my tone suggested I thought you were supporting PLCrashReport commercially. I know you don't and I'm very happy you took the time to write this up. What I meant to convey is that the amount of frustration caused by this problem far outweighs any concerns regarding purity of reports, contrary to the impression you might have gotten from my previous comment about native exceptions being important (they are, but something is better than nothing from a technical standpoint).

The community would benefit immensely if you and Xamarin found an option that satisfies everybody.

As for the explanation. I did not know it is possible to extract meaningful information from correlating registers with selector addresses within the binary. I don't know if this is applicable to MonoTouch but maybe one day it'll be helpful. I don't remember ever needing to go down this deep to find a problem yet.

I'll get back to you via email about the short-term fix and contracting options.

Thanks,
Dan
Comment 27 Dan Abramov 2013-01-24 05:45:53 UTC
Landon, Rolf and everybody,

Thanks for all your help and suggestions.
I shared our fix here:

http://stackoverflow.com/a/14499336/458193

Hope you can work it out internally sometime.

Dan
Comment 28 Rolf Bjarne Kvinge [MSFT] 2016-02-15 14:35:59 UTC
The workaround from comment #27 is good enough for the initial issue / reporter, so I'm closing this bug.

The only thing missing is support for accurate thread/process state (see comments #22 and comment #24), and for that I've opened a new enhancement request: bug #38765.