Supporting half-precision floats is annoying

61 点作者 Athas将近 4 年前

14 条评论

The big problem in this post seems to be compiling via C (and the awkward GPU extensions thereof), which has lousy Float16 support. LLVM has adequate support for at least representing Float16s. Because of this (and a lot of work by a lot of people), Julia has pretty good support for Float16. If you're running on hardware with native Float16 support like a GPU, it works and is fast; if you're running on hardware without native Float16 support, operations are implemented by converting to Float32 and back, which is slow but gives the same results. So you can run the program either way and get the same results and it's fast if your hardware has native support. Same deal with BFloat16 [1], which is the native 16-bit floating point type on Google's TPUs.[1] <a href="https://github.com/JuliaMath/BFloat16s.jl" rel="nofollow">https://github.com/JuliaMath/BFloat16s.jl</a>

pm215将近 4 年前

For what it's worth, the Arm "alternative half-precision" format is not very different from IEEE binary16. The difference is that instead of using the maximum exponent value (0x1f) to represent infinities and NaNs, the AHP format uses it the same way as any other non-zero exponent, to represent normalized fp values. The tradeoff is that you lose NaNs and infinities, but you double the range of numbers you can represent (the max value goes from 65504 to 131008).You have to select AHP by setting an FP config register bit, so (unlike bfloat16 vs binary16) it's a "for this whole chunk of code I am going to use this format" choice.

评论 #28087682 未加载

silverpath将近 4 年前

The article makes a lot of good points. But there are some cases where f16 is very useful. In the context of deep learning it's frequently useful to move from f32 -> f16. This can allow you to double the size of your models in memory (system or GPU/TPU). Since network size is often determinant of performance, doubling the number of parameters/activations in your model can make a big difference.

pjbk将近 4 年前

Some years ago I used them to transfer real-time waveforms from a medical device via USB and BLE links since they provided more than enough precision for the clinical application. The 2x bandwidth increase without resorting to compression (and its computational overhead) allowed us to meet the project specifications just by changing the type of the data array and recompiling.

评论 #28087669 未加载

h2odragon将近 4 年前

Great rant."just don't use f16" seems like the course of wisdom here.I had a 3x 5 bit value, packed structure in something where memory pressure was severe, and it was such a bitch on a SPARC to deal with 16 bit quantities that actually running the data as two passes using more memory wound up being an immensely better approach. The Alphas would diddle yer bits any way you liked at speed, but that was clearly an aberrant ability.

评论 #28087233 未加载

评论 #28087195 未加载

37ef_ced3将近 4 年前

16-bit floating point (as a weight storage format, not used for math) is essential for fast Winograd and Fourier convolution on AVX-512 CPUs. See <a href="https://NN-512.com" rel="nofollow">https://NN-512.com</a>

评论 #28087967 未加载

bullen将近 4 年前

If the CPU only has 32-bit ALUs (afaik all CPUs today have 32+ bit ALUs) there is no reason to "support" half-floats (other than converting floats to half-floats for the GPU which doesn't need to be fast or pretty since you do it beforehand and send it directly to the GPU from the model file format).On the GPU on the other hand 16-bit floats are becoming the standard (the M1 GPU for instance has more 16-bit ALUs than 32-bit). With enough precision for possible resolutions/worlds and you save 2x the memory which makes it a no-brainer really.

评论 #28088166 未加载

owlbite将近 4 年前

The assertion that C doesn't support fp16 is just plain wrong. _Float16 is defined in standards committee work. There's also the __fp16 storage-only type widely supported by compilers.The main issue is many compilers have issues on x86 platforms due to Intel's bizarre slowness in defining how to pass fp16 parameters in their official ABI.

评论 #28087909 未加载

dunham将近 4 年前

This reminds me that at one point Lucene was using an 8-bit floating point format for its normalization factors (I don't know if they still do):<a href="https://lucene.apache.org/core/3_0_3/fileformats.html#N107EF" rel="nofollow">https://lucene.apache.org/core/3_0_3/fileformats.html#N107EF</a>

boulos将近 4 年前

Athas, now that you’ve added fp16, why not add bf16 as well? (A100s support bf16 natively, as do upcoming server CPUs).

评论 #28087547 未加载

ComputerGuru将近 4 年前

.NET recently landed (storage-only) support for f16 as well. Their article is nowhere near as interesting as TFA, but for reference: <a href="https://devblogs.microsoft.com/dotnet/introducing-the-half-type/" rel="nofollow">https://devblogs.microsoft.com/dotnet/introducing-the-half-t...</a>

1wd将近 4 年前

What are the reasons for changing the allocation of bits in bf16 vs. f16? Why are there no (few?) similar alternative allocation schemes for f32 and f64? Was IEEE's choice perfect for f32 / f64? How did they know? Why not for f16?Does any hardware offer "configurable" bit allocation like f16[e=4,m=11]?

评论 #28087985 未加载

tvirosi将近 4 年前

Huh? Half precision floats speeds up machine learning training and inference enormously. It's weird to me to argue that we should sacrifice hours of sliced off waiting time in favor of preserving the ability to write a new language over a weekend.

评论 #28088205 未加载

评论 #28088120 未加载

lmilcin将近 4 年前

Sometimes you need float packed in memory efficiently.This might be useful in graphics, for example.In general, I believe it is good to give people options. You never know what people will find useful.But if you don't want support it and you feel it will make your language better, just don't support it.But please, quit bitchin' about it.Java does not support unsigned integer type and everybody is fine.And Java devs don't write rants about how unsigned integers are useless, supporting them is annoying and everybody should just forget about them.

评论 #28087697 未加载

评论 #28087439 未加载

评论 #28088225 未加载