Bug 8254 - SIGSEV from SGen
Summary: SIGSEV from SGen
Status: RESOLVED NORESPONSE
Alias: None
Product: Runtime
Classification: Mono
Component: General ()
Version: unspecified
Hardware: PC Mac OS
: --- normal
Target Milestone: ---
Assignee: Bugzilla
URL:
Depends on:
Blocks:
 
Reported: 2012-11-07 04:02 UTC by Roope Kangas
Modified: 2017-07-07 19:02 UTC (History)
6 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:


Attachments
Thread dump from a "stuck" process (79.29 KB, application/octet-stream)
2012-11-07 04:13 UTC, Roope Kangas
Details
Stack trace from SIGSEV (59.62 KB, application/octet-stream)
2012-11-19 10:48 UTC, Roope Kangas
Details


Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED NORESPONSE

Description Roope Kangas 2012-11-07 04:02:20 UTC
I am running load tests on our game server. It seems that when I run the tests with mono-sgen I run into SIGSEV every now and then. 

I have run th exe with following command:
MONO_GC_DEBUG=1 mono-sgen -O=all --llvm --trace=disabled --debug our-game.exe ...

(since I want to try out if those optimizations or using llvm helps in our case)

Stacktrace:


Native stacktrace:

	mono-sgen() [0x498441]
	mono-sgen() [0x4ec2df]
	mono-sgen() [0x41c217]
	/lib64/libpthread.so.0(+0xf500) [0x7f6b56301500]
	mono-sgen(mono_class_is_assignable_from+0x30) [0x4ff070]
	mono-sgen(mono_object_isinst+0x3d) [0x57033d]
	mono-sgen() [0x463c88]
	[0x40a13c3e]
...
Full output as attachment (I did omit some Start nursery collection... messages from that)
Comment 1 Roope Kangas 2012-11-07 04:05:25 UTC
It looks like this has happened when the server has tried to exit due to some unhandled exception in our game logic.

It calls stop on all threads then signals an ManualResetEvent which the main thread is waiting on.
Comment 2 Roope Kangas 2012-11-07 04:06:16 UTC
Oh, yes and I am running:


Mono JIT compiler version 3.0.0 (tarball Sun Oct 28 17:55:01 UTC 2012)
Copyright (C) 2002-2012 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           __thread
	SIGSEGV:       altstack
	Notifications: epoll
	Architecture:  amd64
	Disabled:      none
	Misc:          softdebug 
	LLVM:          supported, not enabled.
	GC:            Included Boehm (with typed GC and Parallel Mark)
Comment 3 Roope Kangas 2012-11-07 04:06:41 UTC
mono-sgen is

[ec2-user@load-test-turf-1 GC]$ mono-sgen --version
Mono JIT compiler version 3.0.0 (tarball Sun Oct 28 17:55:01 UTC 2012)
Copyright (C) 2002-2012 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
	TLS:           __thread
	SIGSEGV:       altstack
	Notifications: epoll
	Architecture:  amd64
	Disabled:      none
	Misc:          softdebug 
	LLVM:          supported, not enabled.
	GC:            sgen
Comment 4 Roope Kangas 2012-11-07 04:13:00 UTC
Created attachment 2876 [details]
Thread dump from a "stuck" process

In the same load test I have sometimes seen a process that is stuck (in some busy loop?). This might be related or another problem =)
Comment 5 Rodrigo Kumpera 2012-11-07 11:46:05 UTC
Please install gdb and make sure debug symbols to mono are available so we can get a proper native stack trace.


This doesn't seem to be related to sgen itself, but something else. Does your codebase use unsafe code?
Comment 6 Roope Kangas 2012-11-07 13:17:49 UTC
No unsafe code in our codebase, not sure if one of the dlls can contain something...

I have gdb on the machine, I have compiled mono. Should I recompile with some other options than default?

Is this documentation what I should follow? http://www.mono-project.com/Debugging#For_gdb_7.0
Comment 7 Rodrigo Kumpera 2012-11-07 13:24:17 UTC
Are you stripping your mono after your build it? Does your environment disable debug information by default? You should make sure that you compiled with -g
Comment 8 Roope Kangas 2012-11-15 19:11:55 UTC
Running the server on 3.0.1 (updated mono) I did again see those threads locked at mystery address 0xffffff but now I got some more debug info.

... Here I stop the server with ctr-c and ...

[9000@10.0.1.24] 2012-11-16 01:42:23,796 Local Default DEBUG - END GameThreadWorker
[9000@10.0.1.24] 2012-11-16 01:42:28,482 Local Default DEBUG - END 
* Assertion at threads.c:436, condition `small_id_table [id] != NULL' not met

shell/turf: line 40: 78558 Abort trap: 6           mono-sgen -O=all --llvm --trace=disabled --debug $BINARYNAME --debug --path=`pwd` $*
Comment 9 Rodrigo Kumpera 2012-11-16 11:18:15 UTC
Do you have gdb installed?

Does it happen if you don't use -O=all or llvm?
Comment 10 Roope Kangas 2012-11-19 10:47:17 UTC
I have gdb, it happens without -O=all and llvm
Comment 11 Roope Kangas 2012-11-19 10:48:11 UTC
Created attachment 2968 [details]
Stack trace from SIGSEV
Comment 12 Rodrigo Kumpera 2012-11-19 11:32:24 UTC
Ok, this gives me sort of an idea of what's happening, the heap is been trashed.
Now I need a test case as the backtrace is not enough.

You can share it privately if the code cannot be published in the open.
Comment 13 Roope Kangas 2012-11-19 11:50:54 UTC
Btw. Can you say if I should expect to see this same behaviour without SGen?

Yeah, its not open source. It actually requires quite a setup. depends on mongodb, redis and apache.
And this happened during load testing the application. (had 3 servers running test clients and connecting to the one failing.)

Never seen it locally or in out development environement. We'll try to figure out how we are trashing it. Any pointers?

I know we have a lot caching of data which can consist of small structs (voxel 3d data). And same for bitmaps created from that data for pathfinding. 

Also there has ben some case before we had fucked up code generating voxel structure and one equals call was boxing all structs to classes (which was a problem).
Comment 14 Roope Kangas 2012-11-19 12:03:28 UTC
Sorry, for posting so many separate questions / comments in one message. Just trying to get some pointers from you =)
Comment 15 Ludovic Henry 2017-07-07 19:02:30 UTC
If you can still reproduce with latest mono version, please feel free to reopen the bug. Thank you.