Bug 1728 - JIT compiler emits double-precision instructions for single-precision values in x86-64
Summary: JIT compiler emits double-precision instructions for single-precision values ...
Alias: None
Product: Runtime
Classification: Mono
Component: JIT ()
Version: unspecified
Hardware: PC All
: --- normal
Target Milestone: ---
Assignee: Zoltan Varga
Depends on:
Reported: 2011-10-27 11:40 UTC by Justin Holewinski
Modified: 2017-07-11 21:15 UTC (History)
6 users (show)

Tags: performance
Is this bug a regression?: ---
Last known good build:

Test case (147 bytes, text/x-csharp)
2011-10-27 11:40 UTC, Justin Holewinski

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.

Please create a new report on GitHub or Developer Community with your current version information, steps to reproduce, and relevant error messages or log files if you are hitting an issue that looks similar to this resolved bug and you do not yet see a matching new report.

Related Links:

Description Justin Holewinski 2011-10-27 11:40:19 UTC
Created attachment 781 [details]
Test case

While investigating a performance issue in floating-point heavy C# code, I discovered that the Mono JIT in 2.10.6 x86-64 up-converts floats to doubles for single-precision floating-point expressions.  While this is not wrong, it does lead to a severe performance issue.  When comparing against the x86 vs x86-64 JIT compiler in 2.10.6, I am seeing approximately a 2x slowdown in FP-heavy code, and I believe this is to blame.

Attached is a simple test case.  If I compile/disassemble with:

$ gmcs /optimize+ /target:library float-test.cs /out:float-test.dll
$ mono --aot -O=all float-test.dll
$ objdump -d float-test.dll.so

then I get the following inefficient floating-point code:

    107c:	f3 0f 10 08          	movss  (%rax),%xmm1
    1080:	f3 0f 5a c9          	cvtss2sd %xmm1,%xmm1
    1084:	f2 0f 59 c1          	mulsd  %xmm1,%xmm0
    1088:	49 63 c5             	movslq %r13d,%rax
    108b:	41 39 46 18          	cmp    %eax,0x18(%r14)
    108f:	0f 86 2a 00 00 00    	jbe    10bf <Foo_Compute_single___int+0x9f>
    1095:	49 8d 44 86 20       	lea    0x20(%r14,%rax,4),%rax
    109a:	f2 44 0f 5a f8       	cvtsd2ss %xmm0,%xmm15
    109f:	f3 44 0f 11 38       	movss  %xmm15,(%rax)

Note the up-cast to double, double-precision multiply, down-cast to float sequence.

For reference, the Microsoft .NET JIT compiler emits single-precision SSE instructions (e.g. mulss) for the same code.

If this is by design, can a run-time option be created to allow single-precision instructions?
Comment 1 Andrea Canciani 2014-05-21 06:04:22 UTC
I believe that currently Mono performs all of the computations on doubles in order to comply with section III.1.1.1 of the ECMA-335 specification in the most straightforward way.
In some cases (such as the one you're pointing out) the result of performing the operations on single-precision floats would be the same and avoiding the conversions could improve performance.

AFAICT whenever just one floating point operation is performed on two f32 arguments and its result is only used in a downcast to f32, it would be possible to perform the operation directly in f32 instead of performing all of the casts. An optimization pass could detect this pattern easily (and some other safe patterns, like multiplication and division by powers of two).

There are many patterns that could be optimized in a similar way, but in general casting to f64 might be needed in order to comply with the spec.
Comment 2 Andrea Canciani 2014-05-21 08:17:09 UTC
See https://blog.mozilla.org/javascript/2013/11/07/efficient-float32-arithmetic-in-javascript/ for a similar optimization opportunity (and an analysis of when it is legitimate) in JavaScript.
Comment 3 Rodrigo Kumpera 2017-07-11 21:15:08 UTC
Mono now support R4 math.