The x86 and x64 have interesting hacks in the asm they generate. For example, thread switching is done not by a conditional jump, but by self-modifying code (there's some nops in the main loop, and they get overwritten with a jmp when it's time to preempt). What cool architecture-specific hacks does the ARM version use?