Someone’s Been Messing with My Subnormals

442 点作者 jpegqs超过 2 年前

26 条评论

cesarb超过 2 年前

At a previous company I worked at, we had an issue with our software (Windows-based, written in a proprietary language) randomly crashing. After some debugging, we found that this happened whenever the user made some specific actions, but only if, in that session, the user had previously printed something or opened a file picker. The culprit was either a printer driver or a shell extension which, when loaded, changed the floating point control word to trap. That happened whenever the culprit DLL had been compiled by a specific compiler, which had the offending code in the startup routine it linked into every DLL it produced.Our solution was the inverse of the one presented in this article: instead of wrapping our routines to temporarily set the floating point control word to sane values, we wrapped the calls to either printing or the file picker, and reset the floating point control word to its previous (and sane) value after these calls.

评论 #32741605 未加载

评论 #32740462 未加载

评论 #32741428 未加载

评论 #32746954 未加载

评论 #32743887 未加载

ChrisRackauckas超过 2 年前

The Julia package ecosystem has a lot of safeguards against silent incorrect behavior like this. For example, if you try to add a package binary build which would use fast math flags, it will throw an error and tell you to repent:<a href="https://github.com/JuliaPackaging/BinaryBuilderBase.jl/blob/186b1350fa925ff9722d320b4c8549ec6a5735db/src/Runner.jl#L289" rel="nofollow">https://github.com/JuliaPackaging/BinaryBuilderBase.jl/blob/...</a>In user codes you can do `@fastmath`, but it's at the semantic level so it will change `sin` to `sin_fast` but not recurse down into other people's functions, because at that point you're just asking for trouble. There's also calls to rename it `@unsafemath` in Julia, just to make it explicit. In summary, "Fastmath" is overused and many times people actually want other optimizations (automatic FMA), and people really need to stop throwing global changes around willy-nilly, and programming languages need to force people to avoid such global issues both semantically and within its package ecosystems norms.

评论 #32742789 未加载

jcranmer超过 2 年前

The problem here is that enabling FTZ/DAZ flags involves modifying global (technically thread-local) state that is relatively expensive to do. Ideally, you'd want to twiddle these flags only for code that wants to work in this mode, but given the relative expense of this operation, it's not entirely practicable to auto-add twiddling to every function call, and doing it manually is somewhat challenging because compilers tend to support accessing the floating-point status rather poorly. Also, FTZ/DAZ aren't IEEE 754, so there's no portable function for twiddling these bits as there is for other rounding mode or exception controls. I will note that icc's -fp-model=fast and MSVC's /fp:fast correctly do not link code with crtfastmath.As a side note, this kind of thing is why I think a good title for a fast-math would be "Fast math, or how I learned to start worrying and hate floating point."

评论 #32743919 未加载

评论 #32740385 未加载

TazeTSchnitzel超过 2 年前

Global state is the root of so many evils! FPU rounding mode, FPU flush-to-zero mode, C locale, errno, and probably some other things should all be eliminated. The functionality should still exist but not as global flags.

评论 #32742119 未加载

评论 #32747161 未加载

magicalhippo超过 2 年前

Denormalized numbers is one reason why you really want to think carefully if you try to optimize code by rewriting expressions involving multiplication and division.For example, if you got "x = (a / b) * (c / d)" one might think that rewriting it as "x = (a * c) / (b * d)" will save you a division and gain you speed. It will and it might, respectively.However it will also potentially break an otherwise safe operation. If the numbers are very small, but still normal, then the product (b * d) might result in a denormalized number, and dividing by it will result in +/- infinity.However, the code might guarantee that the ratios (a / b) and (c / d) are not too small or too large, so that multiplying them is guaranteed to lead to a useful result.

评论 #32744861 未加载

评论 #32744144 未加载

评论 #32744662 未加载

black_knight超过 2 年前

I ran Gentoo back in the good old days. The biggest draw was that after about a week of compiling my system ran a lot faster because of all the compiler optimisations one could enable because it only had to work on your CPU.I might be misremembering, but I think fastmath was one of the flags explicitly warned against in the Gentoo manual.

评论 #32740072 未加载

评论 #32740016 未加载

评论 #32748913 未加载

评论 #32739715 未加载

评论 #32739747 未加载

compiler-guy超过 2 年前

-funsafe-math is neither fun nor safe.

评论 #32743681 未加载

stabbles超过 2 年前

See also <a href="https://simonbyrne.github.io/notes/fastmath/" rel="nofollow">https://simonbyrne.github.io/notes/fastmath/</a> for a similar story in julia, where ffast-math is now banned for C/C++/Fortran dependencies

olliej超过 2 年前

Wow, I am surprised that -ffast-math triggers a mode switch in the FPU in part due to the author's library problem, but also because the documentation for clang at least[1] does not say it impacts behaviour of denormals and in fact has a separate mode switch for that, which is not explicitly called out as being implied by -ffast-math.[1] <a href="https://clang.llvm.org/docs/UsersManual.html#cmdoption-ffast-math" rel="nofollow">https://clang.llvm.org/docs/UsersManual.html#cmdoption-ffast...</a>

leni536超过 2 年前

Does this only affect pypi, or should I now worry about shared libraries shipped with my distro as well? Debian is not crazy enough to ship shared libs compiled with -ffast-math, right? RIGHT?

评论 #32753065 未加载

评论 #32743210 未加载

评论 #32759972 未加载

评论 #32744245 未加载

garaetjjte超过 2 年前

> it turns out that when you use -Ofast, -fno-fast-math does not, in fact, disable fast math. lol. lmao.What about -fno-unsafe-math-optimizations?

评论 #32740246 未加载

评论 #32740529 未加载

jupiterelastica超过 2 年前

For anyone (like me) not knowing what subnormal floats are, this StackOfverflow question and answer explain it quite nicely: <a href="https://stackoverflow.com/questions/15140847/denormalized-numbers-ieee-754-floating-point#" rel="nofollow">https://stackoverflow.com/questions/15140847/denormalized-nu...</a>Though they use the term denormalized, which AFAICT is the a synonym for subnormal.Edit: Thanks for this great blogpost :)

dahart超过 2 年前

> Finally, I am legally obligated to inform you that I used GNU Parallel to scan all the wheelsFWIW, this is tangential to this awesome article, but if the author is here or anyone else who cares: that statement isn’t true, you are not legally obligated to mention GNU Parallel. It is nice to do though! The link even says this explicitly, and also separately mentions the citation notice is only asking for scientific paper citations, not blog citations. I love parallel, but boy has that citation notice thing caused a lot of confusion over the years.

评论 #32749699 未加载

评论 #32772647 未加载

Tyr42超过 2 年前

Oh man, great job digging through all that. This is exactly the kind of content I want to see.Don't you love your fun safe math?

raymondh超过 2 年前

This is a rockstar quality post. It is astonishing how much detective work was involved.

benreesman超过 2 年前

That’s…terrifying. This is a fantastic find: big, big respect to @moyix, this is going to save people’s ass.

jesse__超过 2 年前

10/10 yak shave. Would certainly read again

评论 #32748008 未加载

mananaysiempre超过 2 年前

Following the article’s links, I fail to find an actual example of anything failing to converge in flush-subnormals mode. I mean, I’m sure one could be squeezed out, but the justification given amounts to “Sterbenz’s lemma [the one that rephrases “catastrophic cancellation” as “exact differences”] fails, maybe something somewhere also will”. And my (shallow but not nonexistent) experience with numerical analysis is that proofs lump subnormals with underflow, and most of them don’t survive even intermediate underflows.(AFAIU the original Intel justification for pushing subnormals into 754 was gradual underflow, i.e. to give people at least something to look at for debugging when they’ve ran out of precision.)So, yes, it’s not exactly polite to fiddle with floating-point flag bits that are not yours, and it’s better that this not happen for reproducibility if nothing else, but I doubt it actually breaks any interesting numerics.

评论 #32740513 未加载

nsajko超过 2 年前

-Ofast isn't a good name for the option, but in GCC's defense the manual is pretty clear about all this, and there's no excuse for blindly turning on compiler options - they literally change the semantics of your code.

评论 #32740289 未加载

评论 #32741348 未加载

评论 #32740402 未加载

评论 #32740275 未加载

评论 #32744284 未加载

superbatfish超过 2 年前

If you are willing to accept the various caveats that come with -ffast-math for your own library, then it looks like it's okay to use -ffast-math during compilation, but not during linking (because it links in crtfastmath.so, as the author points out).Your own compiled library functions will use the optimizations, but won't force the weird FPU register modes on the rest of the process.

评论 #32749048 未加载

Const-me超过 2 年前

That thread-local MXCSR register is particularly entertaining in a thread pool environment, such as OpenMP. OSes carefully preserve that piece of thread state across context switches.I tend to avoid touching that value, even when it means extra instructions like roundpd for specific rounding mode, or shuffles to avoid division by 0 in the unused lanes.

mhh__超过 2 年前

In D you can opt into specific float algorithms locally rather than a compiler flag.Use of fast math can really really really bite you sometimes, so just being able to opt into using fma and nothing else is awesome.

评论 #32744968 未加载

评论 #32745875 未加载

bee_rider超过 2 年前

A decorator is a nice idea for this.I was going to suggest another package that just resets the MXCSR when imported, but I guess... hypothetically... some function might actually want the FTZ behavior.

评论 #32740712 未加载

jandrese超过 2 年前

> built with an appealing-sounding but dangerous compiler optionIt's -ffast-math isn't it?...Yep. That option is a candy coated foot gun.

mrtesthah超过 2 年前

I thought the purpose of Python was to make development simple and predictable. Needing to track down the compilation and linker flags of every single shared library reveals the fallacy of this abstraction.

评论 #32742735 未加载

puffoflogic超过 2 年前

Dynamic linking is the root of all kinds of evil, enough said.

评论 #32739447 未加载

评论 #32739178 未加载

评论 #32739597 未加载