Notice (2018-05-24): bugzilla.xamarin.com is now in
Please join us on
Visual Studio Developer Community and in the
Mono organizations on
GitHub to continue tracking issues. Bugzilla will remain
available for reference in read-only mode. We will continue to work
on open Bugzilla bugs, copy them to the new locations
as needed for follow-up, and add the new items under Related
Our sincere thanks to everyone who has contributed on this bug
tracker over the years. Thanks also for your understanding as we
make these adjustments and improvements for the future.
Please create a new report for Bug 59875 on
GitHub or Developer Community if you have new
information to add and do not yet see a matching new report.
If the latest results still closely match this report, you can use the
In special cases on GitHub you might also want the comments:
GitHub Markdown with public comments
Created attachment 25045 [details]
I can pinpoint some (possibly all) races that happen in 3741d16503a973a99c724bdab9a255a5f07a3637 in combination with `add_stage_entry ()` and `process_stage_entries ()`. I will discuss the races here, please find the corresponding reports and stack traces attached.
To get these results, I stopped the GC from collecting the entries (immediately returned from `sgen_finalize_if ()` without executing anything) and lowered the number of entries to 100. Due to that, the draining process was started by `sgen_object_register_for_finalization ()` in a very predictable manner; and this is the root of all the reported data races: while `process_stage_entries ()` is wrapped into `LOCK_GC ... UNLOCK_GC`, `add_stage_entry ()` does not know about that (by design).
One thing that can occur is that two threads pick the same `next_entry`. One pauses here (let's call it "T1"), the other one stores its data successfully. More stuff happens, then the entries get drained due to an overflow and "T1" finally continues. At this point, there is no synchronisation between "T1" (checking `entries [index].state != STAGE_ENTRY_FREE`) and the thread that drains the entries (putting `entries [i].state = STAGE_ENTRY_FREE`) which results in #3 (and, vice versa, in a delayed way, in #5). This should not be a problem since `state` is `gint32` and these values are read and written within one instruction on "casual" platforms.
Of course, #4 occurs, when `process_stage_entries ()` sets `*next_entry = 0`. This happens without synchronisation between `process_stage_entries ()` and `add_stage_entry ()` (which reads this value quite often). This is a race (by design) but it is `gint32` again, and can therefore be considered to be harmless.
The other reports (#01, #02, #06, #07, #08, #09) look like false positives to me as TSan seems to have issues with understanding the data flow between `process_stage_entries ()` and `add_stage_entry ()´. With #06 - #09, the message "As if synchronized via sleep" seems to be spot on. #01 and #02 seem to be connected to the unsynchronised read of `state` in `process_stage_entries ()` (technicaly, that's another `gint32` data race): interlocking it (`state = InterlockedRead (&entries [i].state)`) mute these reports.
As far as I can tell, all memory barriers are in place, and I am unable to find a way in which any of these reports could be connected to a harmful execution order (besides having the described `gint32` races).