The widening gap between memory and core speeds suggests to me that traditional RISC philosophy is not the way forward for performance and efficiency; fixed-length instructions, load-store restrictions, and delay slots may make implementation easier and faster at a time when memory could keep up with the CPU and instruction decoding was the bottleneck, but now that memory is often the bottleneck, it makes sense to have more complex, dense instruction encoding and the other features that are usually left out of RISCs, but improve code density.<p>Variable-length instructions are especially beneficial to code density, since often-used instructions can be encoded in fewer bytes, leaving rarer ones to longer sequences. It also allows for easy extension. Relaxing the restriction on only load/store instructions being able to access memory can reduce code size by eliminating many instructions whose sole purpose is to move data between memory and registers; this also leads to requiring fewer explicitly named registers (since instructions reading memory will implicitly name an internal temporary register(s) the CPU can use), reducing the number of bits needed to specify registers.<p>Other considerations like number of operands and how many of them can be memory references also contribute to code density - 0- and 1-operand ISAs require far more instructions for data movement, while 3-operand ISAs may waste encoding space if much of the time, one source operand does not need to be preserved. 2 operands is a natural compromise, and this is what e.g. ARM Thumb does.<p>This is why I find the description of "compressed RISC-V" linked in the article ( <a href="http://www.eecs.berkeley.edu/~waterman/papers/ms-thesis.pdf" rel="nofollow">http://www.eecs.berkeley.edu/~waterman/papers/ms-thesis.pdf</a> ) interesting - benchmark analysis shows that 8 registers are used 60% of the time, and 2-operand instructions are encountered 36/31% statically/dynamically. These characteristics are not so far from those of an ISA that has remained one of the most performant for over 2 decades: x86. It's a denser ISA than regular RISCs, and requires more complex decoding, but not as complex as e.g. VAX. I think the decision to have 8 architectural registers and a 2-operand/1-memory format put x86 in an interesting middle-ground where it wasn't too CISC to implement efficiently, but also wasn't RISC enough to suffer its drawbacks. I'd certainly like to see how an open-source x86 implementation could perform in comparison.