The legend of “x86 CPUs decode instructions into RISC form internally” (2020)

187 点作者 segfaultbuserr将近 2 年前

24 条评论

CalChris将近 2 年前

I've never liked this idea that x86 CPUs decode instructions into RISC form internally. Before there was RISC, before there was even x86, there were microcoded instruction sets [1]. They were first implemented in Wilkes' 1958 EDSAC 2. Indeed the Patterson Ditzel paper even comments on this:<pre><code> Microprogrammed control allows the implementation of complex architectures more cost-effectively than hardwired control. [2] </code></pre> These horizontally microprogrammed instructions interpreted the architectural instruction set. The VAX 11/750 microcode control program had an interpreter loop. There could be more than 100 bits in these horizontal instructions with 30+ fields. Horizontally microprogrammed instructions were not in any way reduced. Indeed, reduction would mean paying the decode tax twice.There was another form, vertical microprogramming, which was closer to RISC. But there was no translation from complex to vertical.[1] <a href="https://www.cs.princeton.edu/courses/archive/fall10/cos375/BestWay.pdf" rel="nofollow noreferrer">https://www.cs.princeton.edu/courses/archive/fall10/cos375/B...</a>[2] <a href="https://inst.eecs.berkeley.edu/~n252/paper/RISC-patterson.pdf" rel="nofollow noreferrer">https://inst.eecs.berkeley.edu/~n252/paper/RISC-patterson.pd...</a>

评论 #36383941 未加载

评论 #36382889 未加载

评论 #36383920 未加载

0xr0kk3r将近 2 年前

It is fascinating that semantic confusion over RISC vs CISC persists since I was in college in the 80's. It is largely meaningless.The naive idea behind RISC is essentially to reduce the ISA to near-register-level operations: load, store, add, subtract, compare, branch. This is great for two things: being the first person to invent an ISA, and teaching computer engineering.Look at the evolution of RISC-V. The intent was to build an open source ISA from a 100% clean slate, using the world's academic computer engineering brains (and corporations that wanted to be free of Arm licensing) ... and a lot of the subtext was initially around ISA purity.Look at the ISA today, specifically the RISC-V extensions that have been ratified. It has a soup of wacky opcodes to optimize corner cases, and obscure vendor specific extensions that are absolutely CISC-y (examine T-Head's additions if you don't believe me!).Ultimately the combination of ISA, implementation (the CPU), and compiler struggle to provide optimal solutions for the majority of applications. This inevitably leads to a complex instruction set computer. Put enough engineers on the optimization problem and that's what happens. It is not a good or bad thing, it just IS.

评论 #36382788 未加载

评论 #36384589 未加载

评论 #36383766 未加载

评论 #36382665 未加载

评论 #36383048 未加载

kens将近 2 年前

What I find most interesting is the "social history" of RISC vs CISC: how did a computer architecture issue from the 1980s turn into something that people vigorously debate 40 years later?I have several theories:1. x86 refused to die as expected, so the RISC vs CISC debate doesn't have a clear winner. There are reasonable arguments that RISC won, CISC won, or it no longer matters.2. RISC vs CISC has clear teams: now Apple vs Intel, previously DEC, Sun, etc vs Intel. So you can tie the debate into your "personal identity" more than most topics. The debate also has an underdog vs entrenched monopoly vibe that makes it more interesting.3 RISC vs CISC is a simple enough topic for everyone to have an opinion (unlike, say, caching policies). But on the other hand, it's vague enough that nobody can agree on anything.4. RISC exists on three levels: First, a design philosophy / ideology. Second, a style of instruction set architecture that results from this philosophy. Finally, a hardware implementation style (deep pipelines, etc) that results. With three levels for discussion, there's lots of room for debate.5. RISC vs CISC has a large real-world impact, not just for computer architects but for developers and users. E.g. Apple switching to ARM affects the user but changing the internal bus architecture does not.(I've been trying to make a blog post on this subject, but it keeps spiraling off in random directions.)

评论 #36386820 未加载

评论 #36383614 未加载

compressedgas将近 2 年前

No mention of AMD's RISC86 which was the patented internal decoding of X86 instructions into a RISC instruction set.<a href="https://patents.google.com/patent/US5926642A/en" rel="nofollow noreferrer">https://patents.google.com/patent/US5926642A/en</a> (1996)

评论 #36381038 未加载

adrianmonk将近 2 年前

If someone says x86 decodes to RISC internally, they might be getting at one of two different ideas:(1) RISC really is the fastest/best way for CPUs to operate internally.(2) x86 performance isn't held back (much) by its outmoded instruction set.x86 architectures were for a while translating into effectively RISC but stopped doing it. Now internally they are less RISC-like. This suggests #1 is false and #2 is true.They could if they want to (because they have) but they don't want to anymore. Presumably because it's not the best way to do it. Although I guess it could be slightly better but not worth the cost of translating.

评论 #36384824 未加载

rany_将近 2 年前

I'm not sure how true this is or if it's a legend but I remember reading about this originating from Intel marketing in response to the rise of the popularity of RISC in the 1990s.In essence it intended to give the impression that there is no need for RISC architecture because x86 was already a RISC behind the scenes. So you got the best of both worlds.

评论 #36381283 未加载

评论 #36381203 未加载

ajross将近 2 年前

"RISC" architectures are doing something effectively identical to uop fusion though. The real myth is the idea of a CISC/RISC dichotomy in the first place when frankly that notion only ever applied to the ISA specifications and not (except for the very earliest cores) CPU designs.In point of fact beyond the instruction decode stage all modern cores look more or less identical.

评论 #36381820 未加载

评论 #36381356 未加载

评论 #36381621 未加载

mikequinlan将近 2 年前

>Final verdict>There is some truth to the story that x86 processors decode instructions into RISC-like form internally. This was, in fact, pretty much how P6 worked, later improvements however made the correspondence tortuous at best. Some microarchitecture families, on the other hand, never did anything of the sort, meaning it was never anywhere near a true statement for them.

评论 #36381083 未加载

kens将近 2 年前

A question that maybe HN can help me answer: are there any new instruction set architectures since, say, 1985 that are CISC? (Excluding, of course, ISAs that are extensions of previous CISC ISAs.)

评论 #36383823 未加载

评论 #36384751 未加载

评论 #36382793 未加载

评论 #36386596 未加载

评论 #36386386 未加载

评论 #36385688 未加载

评论 #36384447 未加载

rollcat将近 2 年前

Somewhat related: <a href="http://danluu.com/new-cpu-features/" rel="nofollow noreferrer">http://danluu.com/new-cpu-features/</a> discussion: <a href="https://news.ycombinator.com/item?id=31093430">https://news.ycombinator.com/item?id=31093430</a>

jylam将近 2 年前

"(the code is 32-bit just to allow us to discuss very old x86 processors)"fsck, that hurts.

评论 #36382587 未加载

fooblaster将近 2 年前

I'm not sure how you could write something like this without considering something like the micro op cache, which is present in all modern x86 and some arm processors. The micro op cache on x86 is effectively is the only way an x86 processor can get full ipc performance, and that's because it contains pre decoded instructions. We don't know the formats here, but we can guarantee that they are fixed length instructions and that they have branch instructions annotated. Yeah sure, these instructions have more complicated semantics than true risc instructions, but they have the most important part - fixed length. This makes it possible for 8-10 of them to be dispatched to the backend per cycle. In my mind, this definitely is the "legend" manifested.

评论 #36384918 未加载

评论 #36384293 未加载

phendrenad2将近 2 年前

No mention of RISC86[1] and the hype[2] surrounding it.[1] <a href="https://patents.google.com/patent/US6336178B1/en" rel="nofollow noreferrer">https://patents.google.com/patent/US6336178B1/en</a>[2] <a href="https://halfhill.com/byte/1996-1_amd-k6.html" rel="nofollow noreferrer">https://halfhill.com/byte/1996-1_amd-k6.html</a>

stncls将近 2 年前

Needs (2020). It explains why, for example, Zen 2 & 3 are not discussed.

评论 #36382032 未加载

cachvico将近 2 年前

Saved this great article from a couple of years ago, <a href="https://medium.com/swlh/what-does-risc-and-cisc-mean-in-2020-7b4d42c9a9de" rel="nofollow noreferrer">https://medium.com/swlh/what-does-risc-and-cisc-mean-in-2020...</a>

wscott将近 2 年前

The splitting of "store address" and "store data" is an intentional performance feature and not a "quirk" of the implementation. If you had a single store uop then the memory system couldn't start doing a lookup on the address until the data to be stored was available. The data is usually the long pole. By having the address in a separate uop the data dependency is broken and the cache accesses allocating that line in the cache can be started much sooner.

moomin将近 2 年前

One woman’s RISC is another man’s CISC. The “perform operation and branch on flags” operation described here might not be part of RISC-V, but it 100% was part of ARM 1 when ARM was at the forefront of the movement.

评论 #36384741 未加载

SinePost将近 2 年前

The "Final Verdict" is very plain and is hardly enhanced by reading the body of the article. It would make more sense if it was put in the opening of the article, creating a complete abstract.

peter_d_sherman将近 2 年前

Related:<a href="https://news.ycombinator.com/item?id=27334855">https://news.ycombinator.com/item?id=27334855</a><a href="https://www.google.com/search?q=%22christopher+domas%22+x86+%22god+mode%22" rel="nofollow noreferrer">https://www.google.com/search?q=%22christopher+domas%22+x86+...</a><a href="https://en.wikipedia.org/wiki/Alternate_Instruction_Set" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Alternate_Instruction_Set</a>>"In 2018 Christopher Domas discovered that some Samuel 2 processors came with the Alternate Instruction Set enabled by default and that by executing AIS instructions from user space, it was possible to gain privilege escalation from Ring 3 to Ring 0.[5] Domas had partially reverse engineered the AIS instruction set using automated fuzzing against a cluster of seven thin clients.[12] Domas used the terms "deeply embedded core" (DEC) plus "deeply embedded instruction set" (DEIS) for the RISC instruction set, "launch instruction" for JMPAI, "bridge instruction" for the x86 prefix wrapper, "global configuration register" for the Feature Control Register (FCR), and documented the privilege escalation with the name "Rosenbridge".[5]"Also -- I should point out that the debate of if x86 (CISC) CPU's contain RISC cores -- is largely academic.Both RISC and CISC CPU's contain ALU's -- so our only debate, really, if we have one, is how exactly data that the ALU is going to process -- is going to wind up at the ALU...It is well known in the x86 community that the x86 instructions are an abstraction, a level of abstraction which runs on top of lower-level of abstraction, the x86 microcode layer...Historically, intentionally or unintentionally, most x86 vendors have done everything they can to hide, obfuscate, and obscure this layer... There (to the best of my knowledge, at this point in time) is no official documentation of this layer, how it works (etc., etc.) from any any major x86 vendor.x86 microcode update blobs -- are binary "black boxes" and encrypted.Most of our (limited) knowledge in this area comes from various others who have attempted to understand the internal workings of x86 microcode:<a href="https://www.google.com/search?q=%22reverse+engineering+x86+processor+microcode%22" rel="nofollow noreferrer">https://www.google.com/search?q=%22reverse+engineering+x86+p...</a><a href="https://github.com/RUB-SysSec/Microcode">https://github.com/RUB-SysSec/Microcode</a><a href="https://twitter.com/_markel___/status/1262697756805795841" rel="nofollow noreferrer">https://twitter.com/_markel___/status/1262697756805795841</a><a href="https://www.youtube.com/watch?v=lY5kucyhKFc">https://www.youtube.com/watch?v=lY5kucyhKFc</a>It should be pointed out that even if a complete understanding of x86 microcode were to be had for one generation of CPU -- there would always be successive generations where that implementation might change -- leaving anyone who would wish to fully understand it, back at square one...To (mis)quote Douglas Adams:"There is a theory which states that if ever anyone discovers exactly what the x86 microcode layer is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable."There is another theory which states that this has already happened." :-) <g>

评论 #36382208 未加载

cptskippy将近 2 年前

I grew up in the 80s and 90s, and what I gathered from listening to the grey beards talk was that RISC based designs were more elegant, easier to understand, and more efficient. When I first started hearing about then modern CISC cpus decoding to RISC, it was pushed as a justification that RISC was fundamentally superior.This was around the time IBM was pushing Power and everyone thought it was poised to dominate the industry.

sobkas将近 2 年前

There are some similarities with Transmeta.

评论 #36383231 未加载

Farfignoggen将近 2 年前

The main issue with Intel's CISC/"RISC" Like Execution Engine is that it takes more transistors to implement that and the Instruction Decoders are huge relative to any ARM/RISC like ISA that has the majority of the ARM ISA Assembly Language Instructions actually translated directly to the Micro-OP format on a one to one basis, assembly language instruction to micro-op instruction. So for the x86 ISA that has most of its assembly language instructions decode into multiple micro-ops on a one to many basis there and that's a more energy intensive process there. And more transistors used to implement all that on the x86 designs use more power and leak more power as well.It takes many times that die area to Implement a full x86 ISA Instruction Decoder compared to and ARM 64 bit only Instruction Decoder and its fewer numbers of total instructions there on say the Apple A14/Firestorm core where Apple could easily fit 8 ARM 64 bit ISA Instruction Decoders on the A14/Later Performance core designs! So the A14/Later wide decode there is 8 Instructions Decoded per cycle and all that feeds into a ridiculously large reorder buffer to extract more instruction level parallelism and get that dispatched to a very wide array of execution ports.So and x86 core has to be much larger there and most of that is currently 6 or less Instruction Decoders wide with AMD being only 4 Instruction Decoders wide the with Zen-4/Earlier and Intel Being 6 wide with Golden Cove that's got only One complex x86 decoder and 5 "Simple" x86 decoders on that design that the tech press has never deep dived the difference there Complex/Simple in that Golden Cove Instruction Decoder design.But the x86 cores are usually clocked around 2GHz higher than Apple's A14/Later cores in Apple's M series SOC designs and really are nowhere near as power efficient as the RISC cores there as the x86 cores are narrower and have lower IPC relative to the A14/Later Apple cores that are extra wide and high IPC in design that can be clocked well inside their Performance/Watt sweet spot range on laptops and have the best battery life metrics on the consumer market. And that's compared to the x86 cores that have to be clocked higher there and outside their Performance/Watt sweet spots where the x86 designs have to be down-clocked on battery power there whereas as the M series Apple laptops run at the same clocks on mains power or battery power.Talk all you want about CISC and RISC but the simpler Instruction Sets of the RISC designs allow for more room for wider ranks of Instruction Decoders on the Custom ARM core designs that send that to wider execution dispatch there to ALUs, and other execution ports that are all 64 bits mostly(Neon and AVX aside) now that get the same work done only that's wider there for the custom ARM designs from Apple that have such high IPC that the processors can be clocked well inside their Performance/Watt sweet spots unlike the narrow x86 cores that have to get clocked well outside their Performance/Watt sweet spots to achieve a similar single core performance than the extra wide order superscalar A14/Later Apple designs.And Apple's A14/Later core designs can be used in Smartphones/Tablets and Laptop/PC as well unlike the x86 designs that never really made inroads into that smartphone market. So actually is the ability of the RISC ISA processors to be made wider order superscalar there to get more done per clock cycle and well inside the Performance/Watt sweet spots there on whatever process node utilized!

bjourne将近 2 年前

RISC just means that the instruction set is reduced (compared to what was the norm in the early 1980s). It does not say whether the architecture is register-memory or load-store (though most RISC ISAs are load-store). As long as the x86 CPUs does not decode to more than, say, two dozen microcode types it uses RISC "internally".

userbinator将近 2 年前

Some actually do use something resembling an actual RISC core --- the VIA Alternate Instruction Set (<a href="https://en.wikipedia.org/wiki/Alternate_Instruction_Set" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Alternate_Instruction_Set</a>) basically exposes the uop format, and if you look at the documentation, you'll find that it's like they took a MIPS core and stripped out irrelevant instructions while adding other ones more useful to x86; even the opcode map and encoding is identical for the instructions that remained.IMHO the RISC vs CISC debate was never about implementation, only ISA. Even the 8086 uses a combination of microcoded and non-microcoded instructions (<a href="https://news.ycombinator.com/item?id=35939168">https://news.ycombinator.com/item?id=35939168</a>).Also, calling it a "legend" in the title is rather clickbaity.

评论 #36387023 未加载

24 条评论

CalChris将近 2 年前

评论 #36383941 未加载

评论 #36382889 未加载

评论 #36383920 未加载

0xr0kk3r将近 2 年前

评论 #36382788 未加载

评论 #36384589 未加载

评论 #36383766 未加载

评论 #36382665 未加载

评论 #36383048 未加载

kens将近 2 年前

评论 #36386820 未加载

评论 #36383614 未加载

compressedgas将近 2 年前

评论 #36381038 未加载

adrianmonk将近 2 年前

评论 #36384824 未加载

rany_将近 2 年前

评论 #36381283 未加载

评论 #36381203 未加载

ajross将近 2 年前

评论 #36381820 未加载

评论 #36381356 未加载

评论 #36381621 未加载

mikequinlan将近 2 年前

评论 #36381083 未加载

kens将近 2 年前

评论 #36383823 未加载

评论 #36384751 未加载

评论 #36382793 未加载

评论 #36386596 未加载

评论 #36386386 未加载

评论 #36385688 未加载

评论 #36384447 未加载

rollcat将近 2 年前

jylam将近 2 年前

"(the code is 32-bit just to allow us to discuss very old x86 processors)"fsck, that hurts.

评论 #36382587 未加载

fooblaster将近 2 年前

评论 #36384918 未加载

评论 #36384293 未加载

phendrenad2将近 2 年前

stncls将近 2 年前

Needs (2020). It explains why, for example, Zen 2 & 3 are not discussed.

评论 #36382032 未加载

cachvico将近 2 年前

wscott将近 2 年前

moomin将近 2 年前

评论 #36384741 未加载

SinePost将近 2 年前

The "Final Verdict" is very plain and is hardly enhanced by reading the body of the article. It would make more sense if it was put in the opening of the article, creating a complete abstract.

peter_d_sherman将近 2 年前

评论 #36382208 未加载

cptskippy将近 2 年前

sobkas将近 2 年前

There are some similarities with Transmeta.

评论 #36383231 未加载

Farfignoggen将近 2 年前

bjourne将近 2 年前

userbinator将近 2 年前

评论 #36387023 未加载