Argh, why do author's write stuff like this -- <i>"It is, not to put too fine a point on it, a creaking old bit of wheezing ironmongery that, had the gods of microprocessor architecture been more generous, would have been smote into oblivion long ago."</i><p>Just because a technology is "old" doesn't mean it is useless, or needs to be replaced. I'm all in favor of fixing problems, and refactoring to improve flow and remove inefficiencies. I am <i>not</i> a fan of re-inventing the wheel because gee, we've had this particular wheel for 50 years and its doing fine but hey let's reimagine it anyway.<p>That said, the kink in x86 architecture was put their by "IBM PC Compatibility" and a Windows/Intel monopoly that went on way too long. But even knowing <i>why</i> the thing has these weird artifacts that just means the engineers are working under constraints you don't understand, doesn't give you license to dismiss what they've done as needing to be "wiped away."<p>We are in a period where enthusiasts can design, build, and operate a completely bespoke ISA and micro-architecture with dense low cost FPGAs. Maybe they don't run at multi-GHz speeds but if you want to contribute positively to the question of computer architecture, there has never been a better time. You don't even have to build the whole thing! You can just add it into an existing architecture and compare how you do against it.<p>Want to do flow control colored register allocation for speculative instruction retirement? You can build the entire execution unit in an FPGA and throw instructions at it to your hearts content and provide analysis of the results.<p>Okay, enough ranting. I want AARCH64 to win so we can reset the problem set back to a smaller number of workarounds, but I think the creativity of people trying to advance the x86 architecture given the constraints is not something to belittled, it is to be admired.
Somewhat off topic from the main thread of the article, but I have always wondered about the multiple privilege levels. What's the expected/intended use for them? The only thing I think of is separating out hardware drivers (assuming ring 1 can still directly read/write I/O ports or memory addresses mapped to hardware) so they can't crash the kernel should the drivers or hardware turn out to be faulty. But I don't think I've ever heard of such a design being used in practice. It seems everyone throws their driver code into ring 0 with the rest of the kernel, and if the driver or hardware faults and takes the kernel with it, too bad, so sad. Ground R̅E̅S̅E̅T̅ and start over.<p>What I find myself wondering is <i>why</i>? It seems like a good idea on paper, at least. Is it just a hangover from other CPU architectures that only had privileged/unprivileged modes, and programmers just ended up sticking with what they were already familiar and comfortable with? Was there some painful gotcha about multiple privilege modes that made them impractical to use, like the time overhead of switching privilege levels made it impossible to meet some hardware deadline? Silicon-level bugs? Something else?
Given the multi core, NUMA and Spectre/meltdown reality we’re living in, and the clear benefits of the io_uring approach, why not just have a dedicated core(s) to handle “interrupts” which are nothing more than entries in a shared memory table?
Since cloud servers are a bigger market than users who want to run an old copy of VisiCalc, why doesn't either Intel or AMD produce a processor line that has none of the old 16 and 32 bit architectures (and long-forgotten vector extensions), implemented in silicon? Why not just make a clean (or as clean as possible) 64 bit x86 processor?
The author states:<p>>'The processor nominally maintained four separate stacks (one for each privilege level), plus a possible “shadow stack” for the operating system or hypervisor.'<p>Can someone elaborate on what the "shadow stack" is and what it's for exactly? This is the first time I've heard this nomenclature.
> interrupt handling would be faster, simpler, more complete, and less prone to corner-case bugs.<p>If it's simpler, I can see why it will be faster and less prone to corner-case bugs (at least, the hardware will have fewer corner cases; the software is a different question).<p>But how can simplification supposed to make FRED more "complete"?
> Rather than use the IDT to locate the entry point of each handler, processor hardware will simply calculate an offset from a fixed base address<p>So, wasn't the 8086 like this? Or at least some microprocessors that jump to $BASE + OFFSET to a point where one JMP fits more or less