This "debunking" is itself mostly plausible-sounding bunk.<p>It gets a lot of details simply wrong. For example, the 68030 wasn't "around 100000 transistors", it was 273000 [1]. The 80386 was very similar at 275000 [2]. By comparison, the ARM1 was around 25000 transistors[3], and yet delivered comparable or better performance. That's a factor of 10! So RISC wasn't just a slight re-allocation of available resources, it was a massive leap.<p>Furthermore, the problem with the complex addressing modes in CISC machines wasn't just a matter of a tradeoff vs. other things this machinery could be used for, the problem was that compilers weren't using these addressing modes at all. And since the vast majority of software was written in high-level language and thus via compilers, the chip area and instruction space dedicated to those complex instructions was simply wasted. And one of the reasons that compilers used sequences of simple instructions instead of one complex instruction was that even on CISCs, the sequence of simple instructions was often faster than the single complex instruction.<p>Calling the seminal book by Turing award winners Patterson and Hennessy "horrible" without any discernible justification is ... well it's an opinion, and everybody is entitled to their opinion, I guess. However, when claiming that "Everything you know about RISC is wrong", you might want to actually provide some evidence for your opinions...<p>Or this one: "These 32-bit Unix systems from the early 1980s still lagged behind DEC's VAX in performance. " What "early 1980s" 32-bit Unix systems were these? The Mac came out in 1984, and it had the 16 bit 68000 CPU. The 68020 was only launched in 1984, I doubt many 32 bit designs based on it made it out the door "early 1980s". The first 32 bit Sun, the 68020-based Sun-3 was launched in September of 1985, so second half of the 1980s, don't think that qualifies as "early". And of course the Sun-3 was faster than the VAX 11. The VAX 8600 and later were introduced around the same time as the Sun-3.<p>Or "it's the thing that nobody talks about: horizontal microcode". Hmm...actually everybody talked about the RISC CPUs <i>not having microcode</i>, at least at the time. So I guess it's technically true that "nobody" talked about horizontal microcode...<p>He seems to completely miss one of the major simplifying benefits of a load/store architecture: simplified page fault handling. When you have a complex instruction with possibly multiple references to memory, each of those references can cause a fault, so you need complex logic to back out of and restart those instructions at different stages. With a load/store architecture, the instruction that faults is a load. Or a store. And that's all it does.<p>It also isn't true that it was the Pentium and OoO that beat the competing RISCs. Intel was already doing that earlier, with the 386 and 486. What allowed Intel to beat superior architectures was that Intel was always at least one fab generation ahead. And being one fab generation ahead meant that they had more transistors to play with (Moore's Law) and those transistors were faster/used less power (Dennard scaling). Their money generated an advantage that sustained the money that sustained the advantage.<p>As stated above, the 386 had 10x the transistors of the ARM1. It also ran at significant faster clock speed (16Mhz-25Hmz vs. 8Mhz). With comparable performance. But comparable performance was more than good enough when you had the entire software ecosystem behind you, efficiency be damned Advantage Wintel.<p>Now that Dennard scaling has been dead and buried for a while, Moore's law is slowing and Intel is no longer one fab generation ahead, x86 is behind ARM and not by a little either. Superior architecture can finally show its superiority in general purpose computing and not just in extremely power sensitive applications. (Well part of the reason is that power-consumption has a way of dominating even general purpose computing).<p>That doesn't mean that everything he writes is wrong, it certainly is true that a complex OoO Pentium and a complex OoO PowerPC were very similar, and only a small percent of the overall logic was decode.<p>But I don't think his overall conclusion is warranted, and with so much of what he writes being simply wrong the rest that is more hand-wavy doesn't convince. Just because instruction decode is not a big part doesn't mean it can't be important for importance. For example, it is claimed that one of the reasons the M1 is comparatively faster than x86 designs is that it has one more instruction decode unit. And the reason for that is not so much that it takes so much less space, but that the units can operate independently, whereas with a variable length instruction stream you need all sorts of interconnects between the decode units, and these interconnects add significant complexity and latency.<p>Right now, RISC, in the from of ARM in general and Apple's MX CPUs in particular, is eating x86's lunch, and no, it's not a coincidence.<p>I just returned my Intel Macbook to my former employer and good riddance. My M1 is sooooo much better in just about every respect that it's not even funny.<p>[1] <a href="https://en.wikipedia.org/wiki/Motorola_68030" rel="nofollow">https://en.wikipedia.org/wiki/Motorola_68030</a><p>[2] <a href="https://en.wikipedia.org/wiki/I386" rel="nofollow">https://en.wikipedia.org/wiki/I386</a><p>[3] <a href="https://www.righto.com/2015/12/reverse-engineering-arm1-ancestor-of.html" rel="nofollow">https://www.righto.com/2015/12/reverse-engineering-arm1-ance...</a>