The big problem in this post seems to be compiling via C (and the awkward GPU extensions thereof), which has lousy Float16 support. LLVM has adequate support for at least representing Float16s. Because of this (and a lot of work by a lot of people), Julia has pretty good support for Float16. If you're running on hardware with native Float16 support like a GPU, it works and is fast; if you're running on hardware without native Float16 support, operations are implemented by converting to Float32 and back, which is slow but gives the same results. So you can run the program either way and get the same results and it's fast if your hardware has native support. Same deal with BFloat16 [1], which is the native 16-bit floating point type on Google's TPUs.<p>[1] <a href="https://github.com/JuliaMath/BFloat16s.jl" rel="nofollow">https://github.com/JuliaMath/BFloat16s.jl</a>