I've been ranting about the inadequacies of mainstream processors for almost twenty years. I remember even back in the late 90s, seeing processors that were 3/4 cache memory, with barely any transistors used for logic. It's surely worse than that now, with the vast majority of logic gates on chips just sitting around idle. To put it in perspective, a typical chip today has close to a billion transistors (the Intel Core i7 has 731 million):<p><a href="https://en.wikipedia.org/wiki/Transistor_count" rel="nofollow">https://en.wikipedia.org/wiki/Transistor_count</a><p>A bare minimum CPU that can do at least one operation per clock cycle probably has between 100,000 (SPARC) and 1 million (the PowerPC 602) transistors and runs at 1 watt. So chips today have 1,000 or 10,000 that number of transistors, but do they run that much faster? No of course not.<p>And we can even take that a step further, because those chips suffered from the same inefficiencies that hinder processors today. A full adder takes 28 (yes, twenty eight) transistors. Could we build an ALU that did one simple operation per clock cycle with 1000 transistors? 10,000? How many of those could we fit on a billion transistor chip?<p>Modern CPUs are so many orders of magnitude slower than they could be with a parallel architecture that I’m amazed data centers even use them. GPUs are sort of going the FPGA route with 512 cores or more, but they are still a couple of orders of magnitude less powerful than they could be. And their proprietary/closed nature will someday relegate them to history, even with OpenCL/CUDA because it frankly sucks to do any real programming when all you have at your disposal is DSP concepts.<p>I really want an open source billion transistor FPGA running at 1 GHz that doesn’t hold my hand with a bunch of proprietary middleware, so that I can program it in a parallel language like Go or MATLAB (Octave). There would be some difficulties with things like interconnect but that’s what things like map reduce are for, to do computation in place rather than transferring data needlessly. Also with diffs or other hash-based algorithms, only portions of data would need to be sent. And it’s time to let go of VHDL/Verilog because it’s one level too low. We really need a language above them that lets us wire up basic logic without fear of the chip burning up.<p>And don’t forget the most important part of all: since the chip is reprogrammable, cores can be multi-purpose, so they store their configuration as code instead of hardwired gates. A few hundred gates can reconfigure themselves on the fly to be ALUs, FPUs, anything really. So instead of wasting vast swaths of the chips for something stupid like cache, it can go to storage for logic layouts.<p>What would I use a chip like this for? Oh I don’t know, AI, physics simulations, formula discovery, protein folding, basically all of the problems that current single threaded architectures can’t touch in a cost-effective manor. The right architecture would bring computing power we don’t expect to see for 50 years to right now. I have a dream of someday being able to run genetic algorithms that take hours to complete in a millisecond, and being able to guide the computer rather than program it directly. That was sort of the promise with quantum computing but I think FPGAs are more feasible.