NVIDIA was confident enough to come out with FP4 cards for training, so 16 bit floats are not the best that can be done.<p>As most hardware speedups over the years came from decreasing precision, I'm sure NVIDIA tries everything to make FP4 work (and then get rid of FP8 multiplies if it can get away with it).