Bug 19235 - Assertion at hazard-pointer.c:233, condition `small_id < HAZARD_TABLE_OVERFLOW' not met
Summary: Assertion at hazard-pointer.c:233, condition `small_id < HAZARD_TABLE_OVERFLO...
Status: RESOLVED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: Interop ()
Version: 3.2.x
Hardware: PC Linux
: --- normal
Target Milestone: ---
Assignee: Mark Probst
URL:
Depends on:
Blocks:
 
Reported: 2014-04-23 02:53 UTC by Andrea Canciani
Modified: 2017-05-31 20:07 UTC (History)
4 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Andrea Canciani 2014-04-23 02:53:27 UTC
The test "bug-18026.exe" sometimes fails with an assertion in hazard-pointer.c.

This might be a known limitation of the current implementation of the runtime:
        /*                                                                                                                                                                                                                                    
         * If this assert fails we don't have enough overflow slots.                                                                                                                                                                          
         * We should contemplate adding them dynamically.  If we can                                                                                                                                                                          
         * make mono_thread_small_id_alloc() lock-free we can just                                                                                                                                                                            
         * allocate them on-demand.                                                                                                                                                                                                           
         */
	g_assert (small_id < HAZARD_TABLE_OVERFLOW);

I've been unable to trigger this assertion locally, but it happens quite frequently (more than 50% of the times) when running the tests on TravisCI.

See https://s3.amazonaws.com/archive.travis-ci.org/jobs/23540160/log.txt and https://s3.amazonaws.com/archive.travis-ci.org/jobs/23540161/log.txt for traces of the failure.
Comment 1 Rodrigo Kumpera 2014-04-23 09:43:00 UTC
Please provide a full backtrace of the crash.

You can probably get that by installing gdb in your test machine.
Comment 2 Andrea Canciani 2014-04-23 09:45:16 UTC
As I already said, I cannot reproduce the issue locally, hence it is not very easy to get a full backtrace.
Nonetheless, a backtrace of the thread that triggered the assertion is available here: https://s3.amazonaws.com/archive.travis-ci.org/jobs/23540160/log.txt

(see the last few lines).
Comment 3 Rodrigo Kumpera 2014-04-23 09:54:40 UTC
Such partial backtrace is, unfortunately, of no use.
Comment 4 Andrea Canciani 2014-04-23 11:59:10 UTC
Is this any better?

https://api.travis-ci.org/jobs/23601201/log.txt?deansi=true
Comment 5 Andrea Canciani 2014-04-23 12:00:24 UTC
Uhm... No, I guess it stopped in the wrong place (and I should ignore SIGPWR, it seems to be used for coordinating the GC).
Comment 6 Rodrigo Kumpera 2014-04-23 12:04:24 UTC
Yes, mono uses signals for a lot of things.
Comment 7 Rodrigo Kumpera 2014-04-23 12:04:56 UTC
Quick Q. What's the environment that you got this backtrace?

I could not get on a 64bits build on a 4 core VM.
Comment 8 Andrea Canciani 2014-04-23 12:27:29 UTC
The TravisCI environment seems to be a 64-bits VM on a 32 core system (I don't know how many cores actually exist in the physical machine).

I'm running another test with gdb set to nostop on SIGPWR and SIGXCPU.
Comment 9 Andrea Canciani 2014-04-23 16:48:21 UTC
It looks like attaching the debugger changes the timings enough to prevent the assertion and hide again the bug: https://s3.amazonaws.com/archive.travis-ci.org/jobs/23618000/log.txt
Comment 10 Rodrigo Kumpera 2014-04-23 19:02:10 UTC
Don't run the tests under gdb.

 Just make sure it's installed an on the PATH. When a crash happen, mono will ask it to attach and dump a backtrace.
Comment 11 Andrea Canciani 2014-04-24 05:09:32 UTC
That worked! Thanks.
The trace is available at the end of this log https://s3.amazonaws.com/archive.travis-ci.org/jobs/23656453/log.txt
Comment 12 Rodrigo Kumpera 2014-04-24 18:39:10 UTC
The bug here is that something else is not cleanup up its hazard pointers well enough. We need to investigate it.
Comment 13 Andrea Canciani 2014-05-16 04:59:15 UTC
I cannot reproduce it anymore on current master.
I believe it was fixed by 2285b4c10f3ed1bfabff391b6c5a7324067e51e7 or a9b07ba04ba870e2a681f7ec8ea253eabd1a15b5.
Comment 14 Rodrigo Kumpera 2014-05-16 10:49:22 UTC
I left the test running overnight and it did not crash.

It was crashing in less than 5 minutes before.
Comment 15 coolercal 2017-05-31 19:47:07 UTC
I'm actually seeing this when trying to run an xUnit android runner application in Jenkins. Same as above, when a debugger is attached the issue is non-existent
Comment 16 Rodrigo Kumpera 2017-05-31 20:07:53 UTC
Can you provide a test case that shows the issue?