I'm rarely keen on posting negatives on articles that clearly took a lot of time to make, but I think this requires a bit of correction.<p>I think this article is very, very simplistic. All of it relates to a 8 bits CPU that is 40+ years old.<p>I switched to HLL as soon as I could get my hand on a compiler, namely, UCSD Pascal at the time! Then the Pascal, then to C and then myriads of other languages. I covered 6502, Z80, 68k (all of them, to 68040), PowerPC (all of them from 601 prototypes to G5s), ARMs (more than I can count) and x86s (same).<p>True to be told, the assembly language I started with /helped a LOT/ with be becoming an efficient developer; a developer who understand what 'code' is being generated when he writes an expression, a statement, a loop, and one who understands what the runtime implication are for most of the 'sugar coating' HLL gives.<p>However, starting (a bit) with the 68k, then even more so with the PowerPC, it became pretty much impossible to write /from scratch/ an assembly equivalent that was QUICKER than the compiler generated code. That was 20+ years ago. DRAM latency happened, pipelining happened and SIMD happened.<p>Today, hand writing assembly is pretty much stupid on modern CPUs. Given the register files, timings, shadow registers, bus latencies etc etc the compiler will ALWAYS be better because there is <i>so much</i> criteria to think about when generating code...<p>I'm not saying that having the knowledge is not useful; the <i>best</i> use of assembly is to write some code il HLL, one that is supposed to be super-mega-critical-quick, then disassemble it and see how it looks. More often then not, you can't make it better than it is <i>in situ</i> -- most of the time you will gain is to prepare your data better, align it better etc etc -- basically, 'hinting' the compiler to do a better job. You can do serious code butchery like that, without a hint of assembler [0].<p>But really, I haven't written any assembly for /performance reasons/ in 15 years, and that was Altivec on PowerPC.<p>For 8 bits, it's all smooth as butter, but the article also doesn't take into account the massive progress in compilers; I'm the author of SimAVR [1] and I've seen my load of generated code for that CPU, and the GCC toolchain is /very hard to beat/ by hand these days.<p>[0]: critical audio loop on one of my old PCI card driver, converting float<->int, applying gain etc while using the register file to the max, and making most use of the pipelining of the G4 (at the time) <a href="https://gist.github.com/buserror/0a3a69cca927b8da6c9c7ee1605007fc" rel="nofollow">https://gist.github.com/buserror/0a3a69cca927b8da6c9c7ee1605...</a> -- note, the inner loop was generated by a script that was doing the cycle calculations (!)<p>[1]: <a href="https://github.com/buserror/simavr" rel="nofollow">https://github.com/buserror/simavr</a>