It can be fun to explore the interactions of unorm and float bit representations even when you have float instructions. E.g. if you bit-or a unorm8 into 0x47000000 (32768.0f) then subtract 32768.0f, you'll get a number very close to right, just a float multiply of (256/255.0f) away. Reordering the math so that the subtraction and multiply can become a single FMA is a fun homework exercise.<p><pre><code> union {
int bits;
float f;
} pun = {x}, scale = {0x47000000};
pun.bits |= scale.bits;
pun.f -= scale.f;
pun.f *= (256/255.0f);
</code></pre>
This basically amounts to a software implementation of int->float conversion instructions; sadly I have never found a spot where it's actually worth doing when you have those int->float instructions available already, even with the FMA as a single instruction.<p>It's also worth considering whether your application can handle approximate conversion. If you have a [0,255] unorm in x, x + (x>>7) or equivalently x + (x>0x7f) will round it to a [0,256] fixed-point value. Crucially, this rounding does handle 0x00 and 0xff inputs correctly. Once in fixed-point with a nice power-of-two divisor, you can play all sorts of tricks, either again making use of the bit representation of floats, using ARM fixed-point instructions, etc. If you've ever looked longingly at the pmulhrsw family of instructions, this is a ripe area to explore.