Bug 30480 - GC regression in 4.0
Summary: GC regression in 4.0
Status: RESOLVED FIXED
Alias: None
Product: Runtime
Classification: Mono
Component: GC ()
Version: 4.0.0
Hardware: PC Mac OS
: --- normal
Target Milestone: 4.2 (C6)
Assignee: Rodrigo Kumpera
URL:
Depends on:
Blocks:
 
Reported: 2015-05-26 18:27 UTC by Rodrigo Kumpera
Modified: 2015-05-27 16:38 UTC (History)
2 users (show)

Tags:
Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.


Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:
Status:
RESOLVED FIXED

Description Rodrigo Kumpera 2015-05-26 18:27:23 UTC
From the repro:

.* Assertion at sgen-alloc.c:460, condition `*p == NULL' not met

https://github.com/yuizumi/repromonoissue
Comment 1 Mark Probst 2015-05-26 20:48:36 UTC
Seems to work on master.
Comment 2 Mark Probst 2015-05-27 13:38:31 UTC
Nice summary: https://gist.github.com/schani/059d7aa74ee8e4fe86b6

Nasty summary:

We're generating code that calls the managed allocator with an object
size of `0`.  The managed allocator doesn't check for that case and
therefore the object allocated after that overwrites that zero-sized
allocation.

Here's a debugger session:

    Breakpoint 1, major_alloc_heap (nursery_size=4194304, nursery_align=4194304, the_nursery_bits=22) at sgen-marksweep.c:233
    233		if (nursery_align)
    (gdb) finish
    Run till exit from #0  major_alloc_heap (nursery_size=4194304, nursery_align=4194304, the_nursery_bits=22) at sgen-marksweep.c:233
    Current language:  auto; currently minimal
    0x000000010035a65f in alloc_nursery () at sgen-gc.c:1421
    1421		data = major_collector.alloc_heap (alloc_size, alloc_size, DEFAULT_NURSERY_BITS);
    Value returned is $1 = (void *) 0x102000000

We've allocated the nursery at `0x102000000`.

    (gdb) watch *(long*)0x1022de6a0
    Hardware watchpoint 2: *(long *) 4331529888

This is a magic number that learned from previous debugging sessions.
It's where in many cases the zero-sized object will be allocated.

    (gdb) cont
    Continuing.
    Reading symbols for shared libraries warning: UUID mismatch detected between:
    	/Users/schani/Work/mono/installation/lib/mono/4.5/mscorlib.dll.dylib
    	/Users/schani/Work/mono/installation/lib/mono/4.5/mscorlib.dll.dylib.dSYM/Contents/Resources/DWARF/mscorlib.dll.dylib...
    warning: Could not find object file "/var/folders/1h/7ghts1ys53xdk8y1czrjzdmw0000gn/T/mono_aot_LHzZKh.o" - no debug information available for "mscorlib.dll".
    
    . done
    Reading symbols for shared libraries warning: UUID mismatch detected between:
    	/Users/schani/Work/mono/installation/lib/mono/4.5/mscorlib.dll.dylib
    	/Users/schani/Work/mono/installation/lib/mono/4.5/mscorlib.dll.dylib.dSYM/Contents/Resources/DWARF/mscorlib.dll.dylib...
    warning: Could not find object file "/var/folders/1h/7ghts1ys53xdk8y1czrjzdmw0000gn/T/mono_aot_LHzZKh.o" - no debug information available for "mscorlib.dll".
    
    . done
    NUnit version 2.4.8
    Copyright (C) 2002-2007 Charlie Poole.
    Copyright (C) 2002-2004 James W. Newkirk, Michael C. Two, Alexei A. Vorontsov.
    Copyright (C) 2000-2002 Philip Craig.
    All Rights Reserved.
    
    Runtime Environment - 
       OS Version: Unix 13.4.0.0
      CLR Version: 4.0.30319.17020 ( 4.0.1 (mono-4.0.0-branch/d64e196 Tue May 26 18:09:16 PDT 2015) )
    
    .[Switching to process 27999 thread 0x400b]
    Hardware watchpoint 2: *(long *) 4331529888
    
    Old value = 0
    New value = 4372641952
    0x0000000105f6b007 in ?? ()
    (gdb) p mono_pmip(0x0000000105f6b007)
    $2 = 0x105c000b0 " (wrapper alloc) object:AllocSmall (intptr,intptr) + 0x67 (0x105f6afa0 0x105f6b039) [0x101144750 - domain-Test.dll]"

The watchpoint triggered in the managed allocator, so the new value at
that address will be the vtable of the zero-sized object.

    (gdb) p *((MonoVTable*)4372641952)->klass
    $3 = {
      element_class = 0x1049c5888, 
      cast_class = 0x1049c5888, 
      supertypes = 0x1049c5998, 
      idepth = 4, 
      rank = 0 '\0', 
      instance_size = 80, 
      inited = 1, 
      init_pending = 0, 
      size_inited = 1, 
      valuetype = 0, 
      enumtype = 0, 
      blittable = 0, 
      unicode = 0, 
      wastypebuilder = 0, 
      min_align = 8 '\b', 
      packing_size = 0, 
      ghcimpl = 0, 
      has_finalize = 1, 
      marshalbyref = 0, 
      contextbound = 0, 
      delegate = 0, 
      gc_descr_inited = 1, 
      has_cctor = 0, 
      has_references = 1, 
      has_static_refs = 0, 
      no_special_static_fields = 0, 
      is_com_object = 0, 
      nested_classes_inited = 0, 
      interfaces_inited = 1, 
      simd_type = 0, 
      is_generic = 0, 
      is_inflated = 1, 
      has_finalize_inited = 1, 
      fields_inited = 1, 
      setup_fields_called = 1, 
      exception_type = 0 '\0', 
      parent = 0x104987aa8, 
      nested_in = 0x0, 
      image = 0x104997000, 
      name = 0x104819140 "ICrasher_Crash", 
      name_space = 0x104819150 "Castle.Proxies.Invocations", 
      type_token = 33554434, 
      vtable_size = 23, 
      interface_count = 0, 
      interface_id = 0, 
      max_interface_id = 1150, 
      interface_offsets_count = 2, 
      interfaces_packed = 0x1049c59c8, 
      interface_offsets_packed = 0x1049c59d8, 
      interface_bitmap = 0x1049c59e0 " ", 
      interfaces = 0x1049c59c8, 
      sizes = {
        class_size = 0, 
        element_size = 0, 
        generic_param_token = 0
      }, 
      flags = 8193, 
      field = {
        first = 0, 
        count = 0
      }, 
      method = {
        first = 0, 
        count = 0
      }, 
      ref_info_handle = 0, 
      marshal_info = 0x0, 
      fields = 0x1049c5b28, 
      methods = 0x0, 
      this_arg = {
        data = {
          klass = 0x1049c5830, 
          type = 0x1049c5830, 
          array = 0x1049c5830, 
          method = 0x1049c5830, 
          generic_param = 0x1049c5830, 
          generic_class = 0x1049c5830
        }, 
        attrs = 0, 
        type = MONO_TYPE_GENERICINST, 
        num_mods = 0, 
        byref = 1, 
        pinned = 0, 
        modifiers = 0x1049c594c
      }, 
      byval_arg = {
        data = {
          klass = 0x1049c5830, 
          type = 0x1049c5830, 
          array = 0x1049c5830, 
          method = 0x1049c5830, 
          generic_param = 0x1049c5830, 
          generic_class = 0x1049c5830
        }, 
        attrs = 0, 
        type = MONO_TYPE_GENERICINST, 
        num_mods = 0, 
        byref = 0, 
        pinned = 0, 
        modifiers = 0x1049c595c
      }, 
      generic_class = 0x1049c5830, 
      generic_container = 0x0, 
      gc_descr = 0x5fa, 
      runtime_info = 0x1048e9620, 
      next_class_cache = 0x0, 
      vtable = 0x1049c5a70, 
      ext = 0x0
    }

It's always this class, `Castle.Proxies.Invocations.ICrasher_Crash`.
Its `instance_size` is 80.

    (gdb) p mono_thread_info_current ()
    $4 = (MonoThreadInfo *) 0x1017949e0
    (gdb) p *(SgenThreadInfo*)$4
    $5 = {
      info = {
        node = {
          next = 0x101002060, 
          key = 4422897664
        }, 
        small_id = 69, 
        native_handle = 16395, 
        thread_state = 1, 
        runtime_thread = 1, 
        tools_thread = 0, 
        suspend_semaphore = 18179, 
        suspend_count = 0, 
        finish_resume_semaphore = 18691, 
        resume_semaphore = 18435, 
        suspend_state = {
          ctx = {
            rax = 0, 
            rbx = 0, 
            rcx = 0, 
            rdx = 0, 
            rbp = 0, 
            rsp = 0, 
            rsi = 0, 
            rdi = 0, 
            r8 = 0, 
            r9 = 0, 
            r10 = 0, 
            r11 = 0, 
            r12 = 0, 
            r13 = 0, 
            r14 = 0, 
            r15 = 0, 
            rip = 0
          }, 
          unwind_data = {0x0, 0x0, 0x0}, 
          valid = 0
        }, 
        async_target = 0, 
        user_data = 0x0, 
        inside_critical_region = 0, 
        is_async_context = 0, 
        create_suspended = 0, 
        create_suspended_sem = 18947, 
        tls = {0x0, 0x1011adfe0, 0x101144750, 0x0, 0x0, 0x0, 0x0, 0x0, 0x107a01910}, 
        handle = 0x117, 
        service_requests = 0
      }, 
      skip = 0, 
      in_critical_region = 0, 
      gc_disabled = 0, 
      stack_end = 0x107a00e74, 
      stack_start = 0x0, 
      stack_start_limit = 0x107801000, 
      tlab_next_addr = 0x101794c28, 
      tlab_start_addr = 0x101794c20, 
      tlab_temp_end_addr = 0x101794c30, 
      tlab_real_end_addr = 0x101794c38, 
      runtime_data = 0x0, 
      stopped_ip = 0x0, 
      stopped_domain = 0x101144750, 
      ctx = {
        rax = 0, 
        rbx = 0, 
        rcx = 0, 
        rdx = 0, 
        rbp = 0, 
        rsp = 0, 
        rsi = 0, 
        rdi = 0, 
        r8 = 0, 
        r9 = 0, 
        r10 = 0, 
        r11 = 0, 
        r12 = 0, 
        r13 = 0, 
        r14 = 0, 
        r15 = 0, 
        rip = 0
      }, 
      tlab_start = 0x1022dd8c0 "0\002\f\005\001", 
      tlab_next = 0x1022de6a0 "?8?\004\001", 
      tlab_temp_end = 0x1022de8c0 "", 
      tlab_real_end = 0x1022de8c0 ""
    }

At this point in the object allocation, `tlab_next` should have
already been advanced, but it wasn't.  It's still at the point at
where this current object is allocated.

    (gdb) disassemble 0x105f6afa0 0x105f6b039
    Dump of assembler code from 0x105f6afa0 to 0x105f6b039:
    0x0000000105f6afa0:	sub    $0x28,%rsp
    0x0000000105f6afa4:	mov    %rbx,(%rsp)
    0x0000000105f6afa8:	mov    %r12,0x8(%rsp)
    0x0000000105f6afad:	mov    %r14,0x10(%rsp)
    0x0000000105f6afb2:	mov    %r15,0x18(%rsp)
    0x0000000105f6afb7:	mov    %rdi,%r14
    0x0000000105f6afba:	mov    %rsi,%r15
    0x0000000105f6afbd:	mov    %r15,0x20(%rsp)
    0x0000000105f6afc2:	mov    %gs:0x860,%rax
    0x0000000105f6afcb:	mov    0x180(%rax),%r12
    0x0000000105f6afd2:	mov    %r12,%rax
    0x0000000105f6afd5:	mov    (%rax),%rbx
    0x0000000105f6afd8:	mov    %rbx,%rax
    0x0000000105f6afdb:	mov    %r15,%rcx
    0x0000000105f6afde:	mov    %rax,%r15
    0x0000000105f6afe1:	add    %rcx,%r15
    0x0000000105f6afe4:	mov    %r15,%rax
    0x0000000105f6afe7:	mov    %gs:0x860,%rcx
    0x0000000105f6aff0:	mov    0x250(%rcx),%rcx
    0x0000000105f6aff7:	cmp    %rcx,%rax
    0x0000000105f6affa:	jae    0x105f6b022
    0x0000000105f6b000:	mov    %r15,(%r12)
    0x0000000105f6b004:	mov    %r14,(%rbx)
    0x0000000105f6b007:	mov    %rbx,%rax
    0x0000000105f6b00a:	mov    (%rsp),%rbx
    0x0000000105f6b00e:	mov    0x8(%rsp),%r12
    0x0000000105f6b013:	mov    0x10(%rsp),%r14
    0x0000000105f6b018:	mov    0x18(%rsp),%r15
    0x0000000105f6b01d:	add    $0x28,%rsp
    0x0000000105f6b021:	retq   
    0x0000000105f6b022:	mov    %r14,%rdi
    0x0000000105f6b025:	mov    0x20(%rsp),%rsi
    0x0000000105f6b02a:	mov    $0x1036b5890,%r11
    0x0000000105f6b034:	callq  *%r11
    0x0000000105f6b037:	jmp    0x105f6b00a
    End of assembler dump.

This is the managed allocator.  Note that the object size is passed as
the second argument, in `%rsi`.

    (gdb) p/x $rsi
    $6 = 0x0

This is the problem.

    (gdb) display/i $pc
    1: x/i $pc  0x105f6b007:	mov    %rbx,%rax
    (gdb) si
    0x0000000105f6b00a in ?? ()
    1: x/i $pc  0x105f6b00a:	mov    (%rsp),%rbx
    (gdb) 
    0x0000000105f6b00e in ?? ()
    1: x/i $pc  0x105f6b00e:	mov    0x8(%rsp),%r12
    (gdb) 
    0x0000000105f6b013 in ?? ()
    1: x/i $pc  0x105f6b013:	mov    0x10(%rsp),%r14
    (gdb) 
    0x0000000105f6b018 in ?? ()
    1: x/i $pc  0x105f6b018:	mov    0x18(%rsp),%r15
    (gdb) 
    0x0000000105f6b01d in ?? ()
    1: x/i $pc  0x105f6b01d:	add    $0x28,%rsp
    (gdb) 
    0x0000000105f6b021 in ?? ()
    1: x/i $pc  0x105f6b021:	retq   
    (gdb) 
    0x0000000106fbf726 in ?? ()
    1: x/i $pc  0x106fbf726:	mov    0x70(%rsp),%r9
    (gdb) p mono_pmip(0x106fbf726)
    $7 = 0x101189fa0 " Castle.Proxies.ICrasherProxy:Crash<T> () + 0xf6 (0x106fbf630 0x106fbf7f6) [0x101144750 - domain-Test.dll]"

This is the method that called the managed allocator.

    (gdb) disassemble 0x106fbf630 0x106fbf7f6
    Dump of assembler code from 0x106fbf630 to 0x106fbf7f6:
    0x0000000106fbf630:	sub    $0x78,%rsp
    0x0000000106fbf634:	mov    %rbx,(%rsp)
    0x0000000106fbf638:	mov    %r13,0x8(%rsp)
    0x0000000106fbf63d:	mov    %r14,0x10(%rsp)
    0x0000000106fbf642:	mov    %r15,0x18(%rsp)
    0x0000000106fbf647:	mov    %r10,0x40(%rsp)
    0x0000000106fbf64c:	mov    %rdi,%r15
    0x0000000106fbf64f:	movq   $0x0,0x50(%rsp)
    0x0000000106fbf658:	mov    %r15,0x58(%rsp)
    0x0000000106fbf65d:	mov    0x10(%r15),%r15
    0x0000000106fbf661:	mov    0x40(%rsp),%rdi
    0x0000000106fbf666:	mov    $0x100679fb0,%r11
    0x0000000106fbf670:	callq  *%r11
    0x0000000106fbf673:	mov    %rax,0x38(%rsp)
    0x0000000106fbf678:	mov    0x38(%rsp),%rax
    0x0000000106fbf67d:	mov    %rax,0x28(%rsp)
    0x0000000106fbf682:	mov    $0x1017803d8,%rax
    0x0000000106fbf68c:	mov    %rax,0x30(%rsp)
    0x0000000106fbf691:	mov    0x30(%rsp),%rax
    0x0000000106fbf696:	mov    %rax,0x20(%rsp)
    0x0000000106fbf69b:	mov    0x20(%rsp),%rsi
    0x0000000106fbf6a0:	mov    0x28(%rsp),%rdi
    0x0000000106fbf6a5:	mov    $0x106fbf850,%r11
    0x0000000106fbf6af:	callq  *%r11
    0x0000000106fbf6b2:	mov    %rax,%rbx
    0x0000000106fbf6b5:	test   %rbx,%rbx
    0x0000000106fbf6b8:	je     0x106fbf6df
    0x0000000106fbf6be:	mov    (%rbx),%rax
    0x0000000106fbf6c1:	mov    (%rax),%rax
    0x0000000106fbf6c4:	mov    0x10(%rax),%rax
    0x0000000106fbf6c8:	mov    0x18(%rax),%rax
    0x0000000106fbf6cc:	mov    $0x100829340,%rcx
    0x0000000106fbf6d6:	cmp    %rcx,%rax
    0x0000000106fbf6d9:	jne    0x106fbf7df
    0x0000000106fbf6df:	mov    $0x1050979e0,%rdi
    0x0000000106fbf6e9:	xor    %esi,%esi
    0x0000000106fbf6eb:	mov    $0x105f6b480,%r11
    0x0000000106fbf6f5:	callq  *%r11
    0x0000000106fbf6f8:	mov    %rax,0x48(%rsp)
    0x0000000106fbf6fd:	mov    %rax,0x70(%rsp)
    0x0000000106fbf702:	mov    0x40(%rsp),%rdi
    0x0000000106fbf707:	mov    $0x100679f70,%r11
    0x0000000106fbf711:	callq  *%r11
    0x0000000106fbf714:	mov    %rax,%rdi
    0x0000000106fbf717:	xor    %esi,%esi
    0x0000000106fbf719:	mov    $0x105f6afa0,%r11
    0x0000000106fbf723:	callq  *%r11

This is the call in question.  Note that `%esi` is zeroed here.

    0x0000000106fbf726:	mov    0x70(%rsp),%r9
    0x0000000106fbf72b:	mov    %rax,0x68(%rsp)
    0x0000000106fbf730:	mov    %rax,%rdi
    0x0000000106fbf733:	mov    0x50(%rsp),%rsi
    0x0000000106fbf738:	mov    0x58(%rsp),%rdx
    0x0000000106fbf73d:	mov    %r15,%rcx
    0x0000000106fbf740:	mov    %rbx,%r8
    0x0000000106fbf743:	mov    $0x106fbf812,%r11
    0x0000000106fbf74d:	callq  *%r11
    0x0000000106fbf750:	mov    0x68(%rsp),%rax
    0x0000000106fbf755:	mov    %rax,%r14
    0x0000000106fbf758:	mov    $0x1050c68f0,%rdi
    0x0000000106fbf762:	mov    $0x1,%esi
    0x0000000106fbf767:	mov    $0x105f6b454,%r11
    0x0000000106fbf771:	callq  *%r11
    0x0000000106fbf774:	mov    %rax,%r13
    0x0000000106fbf777:	mov    %rax,0x60(%rsp)
    0x0000000106fbf77c:	mov    0x40(%rsp),%rdi
    0x0000000106fbf781:	mov    $0x100679f30,%r11
    0x0000000106fbf78b:	callq  *%r11
    0x0000000106fbf78e:	mov    %rax,%rdx
    0x0000000106fbf791:	mov    0x60(%rsp),%rax
    0x0000000106fbf796:	mov    %rax,%rdi
    0x0000000106fbf799:	xor    %esi,%esi
    0x0000000106fbf79b:	mov    (%rax),%rax
    0x0000000106fbf79e:	callq  *0xf8(%rax)
    0x0000000106fbf7a4:	mov    %r14,%rdi
    0x0000000106fbf7a7:	mov    %r13,%rsi
    0x0000000106fbf7aa:	mov    $0x106fbf804,%r11
    0x0000000106fbf7b4:	callq  *%r11
    0x0000000106fbf7b7:	mov    %r14,%rdi
    0x0000000106fbf7ba:	mov    $0x106fbf7f6,%r11
    0x0000000106fbf7c4:	callq  *%r11
    0x0000000106fbf7c7:	mov    (%rsp),%rbx
    0x0000000106fbf7cb:	mov    0x8(%rsp),%r13
    0x0000000106fbf7d0:	mov    0x10(%rsp),%r14
    0x0000000106fbf7d5:	mov    0x18(%rsp),%r15
    0x0000000106fbf7da:	add    $0x78,%rsp
    0x0000000106fbf7de:	retq   
    0x0000000106fbf7df:	mov    $0x11d,%esi
    0x0000000106fbf7e4:	mov    $0x86b,%edi
    0x0000000106fbf7e9:	mov    $0x100679b60,%r11
    0x0000000106fbf7f3:	callq  *%r11
    End of assembler dump.
Comment 3 Rodrigo Kumpera 2015-05-27 16:38:06 UTC
Yeah, it works on master, probably due to some of the SRE fixes we did.

In any case, I added some asserts in mini to trap this earlier.