> <i>Various assembler optimisations to a number of different algorithms (e.g. AES-GCM, ChaCha20, SM3, SM4, SM4-GCM) across multiple processor architectures</i><p>With modern compilers, how often (or in what circumstances) is it worth "hand-rolling" assembler code versus just letting the compiler do it? Does one make the assembler 'from scratch', or perhaps let the compiler generate the assembler and have a human look at it to see if there are any places it can be improved?