It would have been better to have an algorithm description somewhere, but it is not hard to follow. So crzy64 is essentially base64 plus two optimizations:<p>- Output bytes [A-Za-z0-9./] are shuffled so that it can be efficiently translated from and to the 0..63 range. The exact remapping is as follows:<p><pre><code> cDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz./0123456789AaBbC
</code></pre>
- Bits are reordered and <i>XORed with each other</i> to make the unpacking from three 8-bit bytes to four 6-bit units extremely simple: specifically, `x ^ (x >> 6)`. The input bits `qrstuvwx ijklmnop abcdefgh` thus should be converted to the following four 6-bit units, XORed together (here represented with two-bit padding as in the actual implementation):<p><pre><code> 00abcdef 00ijklgh 00qrmnop 00stuvwx
0000ijkl 00000000 0000gh00 00mnop00
000000qr 00000000 00000000 00gh0000
</code></pre>
Since both optimizations can equally work well for SIMD and for SWAR (SIMD within a register), I guess they may even be useful for small inputs.