Notice (2018-05-24): bugzilla.xamarin.com is now in
Please join us on
Visual Studio Developer Community and in the
Mono organizations on
GitHub to continue tracking issues. Bugzilla will remain
available for reference in read-only mode. We will continue to work
on open Bugzilla bugs, copy them to the new locations
as needed for follow-up, and add the new items under Related
Our sincere thanks to everyone who has contributed on this bug
tracker over the years. Thanks also for your understanding as we
make these adjustments and improvements for the future.
Please create a new report on
GitHub or Developer Community with
your current version information, steps to reproduce, and relevant error
messages or log files if you are hitting an issue that looks similar to
this resolved bug and you do not yet see a matching new report.
Created attachment 781 [details]
While investigating a performance issue in floating-point heavy C# code, I discovered that the Mono JIT in 2.10.6 x86-64 up-converts floats to doubles for single-precision floating-point expressions. While this is not wrong, it does lead to a severe performance issue. When comparing against the x86 vs x86-64 JIT compiler in 2.10.6, I am seeing approximately a 2x slowdown in FP-heavy code, and I believe this is to blame.
Attached is a simple test case. If I compile/disassemble with:
$ gmcs /optimize+ /target:library float-test.cs /out:float-test.dll
$ mono --aot -O=all float-test.dll
$ objdump -d float-test.dll.so
then I get the following inefficient floating-point code:
107c: f3 0f 10 08 movss (%rax),%xmm1
1080: f3 0f 5a c9 cvtss2sd %xmm1,%xmm1
1084: f2 0f 59 c1 mulsd %xmm1,%xmm0
1088: 49 63 c5 movslq %r13d,%rax
108b: 41 39 46 18 cmp %eax,0x18(%r14)
108f: 0f 86 2a 00 00 00 jbe 10bf <Foo_Compute_single___int+0x9f>
1095: 49 8d 44 86 20 lea 0x20(%r14,%rax,4),%rax
109a: f2 44 0f 5a f8 cvtsd2ss %xmm0,%xmm15
109f: f3 44 0f 11 38 movss %xmm15,(%rax)
Note the up-cast to double, double-precision multiply, down-cast to float sequence.
For reference, the Microsoft .NET JIT compiler emits single-precision SSE instructions (e.g. mulss) for the same code.
If this is by design, can a run-time option be created to allow single-precision instructions?
I believe that currently Mono performs all of the computations on doubles in order to comply with section III.1.1.1 of the ECMA-335 specification in the most straightforward way.
In some cases (such as the one you're pointing out) the result of performing the operations on single-precision floats would be the same and avoiding the conversions could improve performance.
AFAICT whenever just one floating point operation is performed on two f32 arguments and its result is only used in a downcast to f32, it would be possible to perform the operation directly in f32 instead of performing all of the casts. An optimization pass could detect this pattern easily (and some other safe patterns, like multiplication and division by powers of two).
There are many patterns that could be optimized in a similar way, but in general casting to f64 might be needed in order to comply with the spec.
Mono now support R4 math.