Beware of fast-math

189 点作者 simonbyrne超过 3 年前

15 条评论

Fun fact: when working on Herbie (<a href="http://herbie.uwplse.org" rel="nofollow">http://herbie.uwplse.org</a>), our automated tool for reducing floating-point error by rearranging your mathematical expressions, we found that fast-math often undid Herbie's improvements. In a sense, Herbie and fast-math are opposites: one makes code more accurate (sometimes slower, sometimes faster), while the other makes code faster (sometimes less accurate, sometimes more).

评论 #29205765 未加载

评论 #29205134 未加载

评论 #29203222 未加载

评论 #29205874 未加载

评论 #29203501 未加载

headPoet超过 3 年前

-funsafe-math-optimizations always makes me laugh. Of course I want fun and safe math optimisations

评论 #29207583 未加载

评论 #29209050 未加载

评论 #29209230 未加载

SeanLuke超过 3 年前

The other examples he gave trade off significant math deficiencies for small speed gains. But flushing subnormals to zero can produce a MASSIVE speed gain. Like 1000x. And including subnormals isn't necessarily good floating point practice -- they were rather controversial during the development of IEEE 754 as I understand it. The tradeoff here is markedly different than in the other cases.

评论 #29202826 未加载

评论 #29202251 未加载

dmitrygr超过 3 年前

Contrarian waypoint: beware of not-fast-math. Making things like atan2f and sqrtf set errno takes you down a very slow path, costing you significant perf in cases where you likely do not want it. And most math will work fine with fast-math, if you are careful how you write it. (Free online numerical methods classes are available, eg [1]) Without fast-math most compilers cannot even use FMA instructions (costing you up to 2x in cases where they could be used otherwise) since they cannot prove it will produce the same result - FMA will actually likely produce a more accurate result, but your compiler is handicapped by lack of fast-math to offer it to you.[1] <a href="https://ocw.mit.edu/courses/mathematics/18-335j-introduction-to-numerical-methods-spring-2019/" rel="nofollow">https://ocw.mit.edu/courses/mathematics/18-335j-introduction...</a>

评论 #29202036 未加载

评论 #29202065 未加载

评论 #29202349 未加载

评论 #29202151 未加载

vchuravy超过 3 年前

Especially the fact that loading a library compiled with GCC and fast math on, can modify the global state of the program... It's one of the most baffling decisions made in the name of performance.I would really like for someone to take fast math seriously, and to provide well scoped and granular options to programmers. The Julia `@fastmath` macro gets close, but it is two broad. I want to control the flags individually.Also the question how that interacts with IPO/inlining...

评论 #29202742 未加载

评论 #29205930 未加载

smitop超过 3 年前

The LLVM IR is more expressive than clang is for expressing fast-math: it supports making an operation use fast-math optimization on a per operation basis (<a href="https://llvm.org/docs/LangRef.html#fastmath" rel="nofollow">https://llvm.org/docs/LangRef.html#fastmath</a>).

评论 #29205125 未加载

dlsa超过 3 年前

Never considered fast-math. I get the sense its useful but can create awkward and/or unexpected surprises. If I was to use it I'd have to have a verification test harness as part of some pipeline to comfirm no weirdness. Literally a bunch of example canary calculations to determine if fast-math will kill or harm some real use case.Is this a sensible approach? What are others experiences around this? I've never bothered with this kind of optimisation and I now vaguely feel like I'm missing out.I tend to use calculations for deterministic purposes rather than pure accuracy. 1+1=2.1 where the answer is stable and approximate is still better and more useful than 1+1=2.0 but where the answer is unstable. Eg because one of those is 0.9999999 and the precision triggers some edge case.

评论 #29204128 未加载

willis936超过 3 年前

I use single precision floating point to save memory and computation in applications where it makes sense. I had a case where I didn't care about the vertical precision of a signal very much. It had a sample rate in the tens of thousands of samples per second. I was generating a sinusoid and transmitting it. On the receiver the signal would become garbled after about a minute. I slapped my head and immediately realized I ran out of precision by using a single precision time value feeding the sin function when t became too large with the small increment.<pre><code> sin(single(t)) == bad single(sin(t)) == good</code></pre>

评论 #29204733 未加载

zoomablemind超过 3 年前

On the subject of the floating-point math in general, I wonder what's the practical way to treat the extreme order values (close to zero ~ 1E-200, or infinity ~ 1E200, but not zero or inf)? This can take place in some iterative methods, expansion series, or around some singularities.How reliable is it to keep the exreme orders in expectation that the resp. quatities would cancel the orders properly yielding a meaningful value (rounding wise)?For example, calculating some resulting value function, expressed asv(x)=f(x)/g(x),where both f(x) and g(x) are oscillating with a number of roots in a given interval of x.

评论 #29203474 未加载

评论 #29204834 未加载

评论 #29202417 未加载

kzrdude超过 3 年前

It looks like -fassociative-math is "safe" in the sense that it can not be used to get UB in working code? That's a good property to make it easier to use in the right context.

评论 #29202076 未加载

评论 #29202291 未加载

评论 #29201997 未加载

gnufx超过 3 年前

You will generally want at least -funsafe-math-optimizations for performance-critical loops. Otherwise you won't get vectorization at all with ARM Neon, for instance. You also won't get some simple loops vectorized (like products) or generally(?) loop nest optimizations. You just may not be able to afford the maybe order of magnitude cost if your code is bottlenecked on such things (although HPC code actually may well not be).In my experience much scientific Fortran code, at least, is OK with something like -ffast-math, at least because it's likely to have been used with ifort at some stage, and even with non-754-compliant hardware if it's old enough. Obviously you should check, though, and perhaps confine such optimizations to where they're needed.BLIS turned on -funsafe-math-optimizations (if I recall correctly) to provide extra vectorization, and still passed its extensive test suite. (The GEMM implementation is possibly the ultimate loop nest restructuring.)

pfdietz超过 3 年前

The link to Kahan Summation was interesting.<a href="https://en.wikipedia.org/wiki/Kahan_summation_algorithm" rel="nofollow">https://en.wikipedia.org/wiki/Kahan_summation_algorithm</a>

optimalsolver超过 3 年前

"-fno-math-errno" and "-fno-signed-zeros" can be turned on without any problems.I got a four times speedup on <cmath> functions with no loss in accuracy.

评论 #29207852 未加载

jjgreen超过 3 年前

One trick that I happened upon was speeding up complex multiplication (like a factor of 5) under gcc with the --enable-cx-fortran switch.

评论 #29205435 未加载

markhahn超过 3 年前

NaN's should trap, but compilers should not worry about accurate debugging.