Bug 17152 - ARM v5 cannot build runtime 3.2.7 master due to mcs error
Summary: ARM v5 cannot build runtime 3.2.7 master due to mcs error
Alias: None
Product: Runtime
Classification: Mono
Component: JIT ()
Version: 3.2.x
Hardware: Other Linux
: --- normal
Target Milestone: ---
Assignee: Alex Rønne Petersen
Depends on:
Reported: 2014-01-09 15:40 UTC by Brandon White
Modified: 2014-03-10 10:39 UTC (History)
7 users (show)

Is this bug a regression?: ---
Last known good build:

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.

Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:

Description Brandon White 2014-01-09 15:40:10 UTC
Hi, I'm attempting to build the latest Mono on an embedded Linux board.  It is a Technologic Systems TS-4712 (http://wiki.embeddedarm.com/wiki/TS-4712).  The processor is a Marvell PXA166 ARM9 (ARM v5).  The OS is Debian Wheezy.

The native code builds without issue.  This problem arises during the managed code building.

Making all in runtime
make[2]: Entering directory `/root/mono/runtime'
if test -w /root/mono/mcs; then :; else chmod -R +w /root/mono/mcs; fi
cd /root/mono/mcs && make --no-print-directory -s NO_DIR_CHECK=1
PROFILES='net_2_0 net_3_5 net_4_0 net_4_5  ' CC='gcc' all-profiles
Bootstrap compiler: Mono C# compiler version
MCS     [build] mscorlib.dll
System.Collections/Queue.cs(28,0): error CS1525: Unexpected symbol `E'
System.Collections/Queue.cs(42,15): error CS1040: Preprocessor directives must
appear as the first non-whitespace character on a line
System.Collections/Queue.cs(44,7): error CS1040: Preprocessor directives must
appear as the first non-whitespace character on a line
System.Collections/Queue.cs(46,9): error CS1040: Preprocessor directives must
appear as the first non-whitespace character on a line
System.Collections/Queue.cs(150,53): warning CS0078: The `l' suffix is easily
confused with the digit `1' (use `L' for clarity)
System.Collections/Queue.cs(386,1): error CS1035: End-of-file found, '*/'
Compilation failed: 5 error(s), 1 warnings
make[8]: *** [../../class/lib/build/tmp/mscorlib.dll] Error 1

root@ts4700:~/mono# uname -a
Linux ts4700 2.6.34-ts471x #88 PREEMPT Thu Aug 15 10:29:09 MST 2013 armv5tejl


I've actually tried it using both mono-mcs and make get-monolite-latest with the same result.
Upon inspection of the Queue.cs file, I see no issues in the source code as depicted by the compiler.  In fact, the first few errors are clearly in the boilerplate header.
Comment 1 Alex Rønne Petersen 2014-01-09 19:30:04 UTC
Is this an armel or armhf system?

Any chance you could get me SSH access to this board so I could take a look?

Note: Very, very few people still use ARM v5 today. Even the Raspberry Pi (which uses ancient silicon) is ARM v6, and Android devices with ARM v5 are virtually impossible to get these days. So we don't really get a chance to test Mono much on ARM v5.
Comment 2 Brandon White 2014-01-09 22:47:52 UTC
This is an armel system.  It would be my pleasure to have you SSH to the board.  I will confirm my firewall settings and e-mail you the credentials.

I hear you about the ARM v5 -- I would love for this project to be on, say, v7, but this is an industrial automation system.  Heck, this board just came out in 2013!  I'm pushing the envelope a bit by trying to get C# .NET 4.5 capabilities on it.  I can always fall back to the older .NET 4.0 Mono that's in Debian Wheezy, but where's the fun in that?  No async/await?  Meh :)
Comment 3 Paolo Molaro 2014-01-31 09:48:53 UTC
Could you confirm unalignement fixup is enabled on the system (cat /proc/cpu/alignment)? This could be related, see also bug#17495.
Comment 4 Brandon White 2014-01-31 09:53:49 UTC
uname -a
Linux ts4700 2.6.34-ts471x #88 PREEMPT Thu Aug 15 10:29:09 MST 2013 armv5tejl GNU/Linux

cat /proc/cpu/alignment
User:           0
System:         9482
Skipped:        0
Half:           9481
Word:           0
DWord:          1
Multi:          0
User faults:    0 (ignored)
Comment 5 Brandon White 2014-01-31 09:56:17 UTC
Aha!  Indeed I think this is related.  I read bug#17495 and I have encountered that same issue on this system when attempting to uses prebuilt mono from Debian Sid.
Comment 6 Brandon White 2014-01-31 10:01:27 UTC
Correction:  Not Sid. I meant to say Experimental, which is mono 3.2.3.
Comment 7 Brandon White 2014-02-03 09:55:16 UTC
I ran `echo 2 > /proc/cpu/alignment` as suggested by Paolo in #17495 and now I'm able to build mscorlib.dll that was previous failing as reported.  I'm also able to run `make check` which was previously failing in gc-memfuncs (Alex Peterson brought this to my attention) and now it passes.

I think you can close this issue.  The alignment fix has clearly resolved my originally reported problem.

I would appreciate some guidance on moving forward with an alignment fix for this board, as I am not very familiar with this phenomenon.  While I certainly understand unaligned memory accesses, I am not familiar with /proc/cpu/alignment and unalignment fixup (though I will be doing some Googling on the subject).

- What is the best practice for addressing this?
- Is this something I need to report to the board manufacturer for them to address in their kernel and/or OS image?
Comment 8 Alex Rønne Petersen 2014-02-03 10:48:13 UTC
I would think this has to do with how your kernel is compiled. That could either be a distro issue or an issue with whoever else provided the kernel used on your particular board.

I'll close this as NOTABUG.
Comment 9 Paolo Molaro 2014-02-04 09:37:37 UTC
Alex, this is more complicated, reopening the bug.
What happens is this: we copy data around using word read/store even when the pointers are not word-aligned. This is bad not only on old arm processors (it is horrible there because there could be silent corruption of data), but it is also bad on other architectures where it would cause segfaults (ppc, sparc, likely others). Where it doesn't segfault it is a performance issue (the kernel tweak above makes the code run, but it will be very slow).
Digging into this, it seems it has been introduced with sweeping changes of memmove/memcopy to mono_gc_memmove(), which is a limited interface that doesn't work correctly in the general case.
Comment 10 Brandon White 2014-02-04 10:00:00 UTC
Thanks for the explanation Paolo.  I read up on /proc/cpu/alignment and got the clear indication that the `echo 2` setting is suboptimal and it is far better to change your code to avoid such situations.  It seems your advice is also along these lines.

So as I understand it, mono_gc_memmove() is a hand-coded replacement for the CRT memmove for use with managed blocks of memory. It takes advantage of the assumption that the managed objects are always word-aligned.  It is not, however, a general purpose replacement for memmove since it will improperly handle misaligned blocks -- such as when moving around arbitrary byte blocks in a buffer.

So to fix this issue in mono, all calls to mono_gc_memmove() need to be reviewed to determine if it is appropriate (ie the block is a managed block) and changed back to memmove (or whatever) if it is an arbitrary block of bytes.
Comment 11 Alex Rønne Petersen 2014-02-04 10:15:19 UTC
Paolo, for the non-aligned case, is it always okay to copy byte-wise? IOW, are we guaranteed that if we're copying managed references, they'll always sit on an aligned address?
Comment 12 Paolo Molaro 2014-02-04 12:02:56 UTC
There are two ways to fix this:
1) review all the places, make sure the constraints are always obeyed (including future code)
2) fix mono_gc_memmove() to behave correctly in all cases

I'm leaning towards 2, while at the same time putting back calls to memmove() where it is safe to do, mainly from a maintenance point of view.

Alex: managed references must be aligned, but the answer to your question is a bit tricky. What if someone uses mono_gc_memove() to copy also an int16 that sits in memory before a reference? The src address is unaligned, but we must not use byte copies for the whole 6 bytes (on 32 bit systems), otherwise the managed reference gets mangled if stopped by the GC in-between.
Comment 13 Alex Rønne Petersen 2014-02-04 12:41:34 UTC
So we would handle all data with a length < sizeof(void *) as byte-wise copies. For data with a length > sizeof(void *), copy byte-wise until we reach an aligned address and the remaining length > sizeof(void *). Then copy word-wise until we reach a point where the remaining length is < sizeof(void *), and then copy the rest byte-wise.

Does that make sense?
Comment 14 Paolo Molaro 2014-02-04 13:52:31 UTC
Alex, you need to consider the alignment both of the source and destination pointers.
Comment 15 Brandon White 2014-03-09 22:08:52 UTC
Update:  I pulled the latest master on 2014-03-07, and this problem is no longer occurring for me. 

https://github.com/mono/mono/commit/a2c41c347fb2d873da23800a677180ad3be4205b caught my attention and was what prompted me to give it another go with the latest code.  It seems that this commit has fixed the bug on my ARMv5 platform.

I'm now able to fully build both native and managed code, and execute some apps of mine that were previously crashing.
Comment 16 Rodrigo Kumpera 2014-03-10 10:39:26 UTC
Marking as fixed.