> The root cause of the Spectre and Meltdown vulnerabilities was that processor architects were trying to build not just fast processors, but fast processors that expose the same abstract machine as a PDP-11. [...]<p>This strikes me as a flavor of the VLIW+compilers-could-statically-do-more-of-the-work argument, though TFA does not mention VLIW architectures.<p>C or not, making compilers do more of the work is not trivial, it is not even simple, not even hard -- it's insanely difficult, at least for VLIW architectures, and it's insanely difficult whether we're using C or, say, Haskell. The only concession to make is that a Haskell compiler would have a lot more freedom than a C compiler, and a much more integrated view of the code to generate, but still, it'd be insanely hard to do all of the scheduling in the compiler. Moreover, the moment you share a CPU and its caches is the moment that static scheduling no longer works, and there is a lot of economic pressure to share resources.<p>There are reasons that this make-the-compilers-insanely-smart approach has failed.<p>It might be more likely to be successful <i>now</i> than 15 years ago, and it might be more successful if applied to Rust or Haskell or some such than C, but, honestly?, I just don't believe this will work anytime soon, and it's all academic anyways as long as the CPU architects keep churning out CPUs with hidden caches and speculative execution.<p>If you want this to be feasible, the first step is to make a CPU where you can turn off speculative execution and where there is no sharing between hardware threads. This could be an extension of existing CPUs.<p>A much more interesting approach might be to build asynchrony right into the CPUs and their ISAs. Suppose LOADs and STOREs were asynchronous, with an AWAIT-type instruction by which to implement micro event loops... then compilers could effectively do CPS conversion and automatically make your code locally async. This <i>is</i> feasible because CPS conversion is well-understood, but this is a far cry from the VLIW approach. Indeed, this is a lot simpler than the VLIW approach.<p>TFA mentions CMT and ULtraSPARC, and that's certainly a design direction, but note that it's one that makes C less of a problem anyways -- so maybe C isn't the problem...<p>Still, IMO TFA is right that C is a large part of the problem. Evented programs and libraries written in languages that insist on immutable data structures would help a great deal. Sharing even less across HW/SW threads (not even immutable data) would still be needed in order to eliminate the need for cache coherency, but just having immutable data would help reduce cache snooping overhead in actual programs. But the CPUs will continue to be von Neuman designs at heart.