Not every workload is memory bandwidth bound like his "make -j16" compile. Some workloads need memory latency or fast inter-core (and inter-socket) operations (e.g. RDBMS OLTP), some need CPU throughput (e.g. HPC), some need best possible single thread CPU performance (e.g. some gaming).<p>As he wrote, CPUs are most efficient (compute per Watt) at a specific frequency, and if his CPU mostly waits for RAM, this can be done at low power.<p>It's probably possible to create x86-64 CPUs with narrower backends (fewer execution units) with microcode-emulated 128 and 256 bit registers/operations (and maybe even emulated FPU) and get a cheaper and faster build server, if it was economical to fab such narrow-use-case chips (those would be good for redis/memcached too I imagine).