What seems to be missing are the hardware optimized and accelerated short and big memcpy/memset.<p>On x86_64, on modern micro-archs, "rep stos[bwdq]" and "rep movs[bwdq]". I bet that, in modern binaries, memcpy/memset call sites are actually place holders for such instructions (before the memory segment goes back to Read/Executable), registers are rdi,rsi,rdx (rcx would be pushed on the stack or the code generated to account for just rcx availability on the call site).<p>Also, expect x86_64 -> risc-v port bugs because to:
byte->byte
word->halfword
doubleword->word
quadword->doubleword