"Compilers are better now" is a perennial claim. There are lots of regularly-structured problems where this approach can be fast. The lessons of VLIW, Itanium, and compilers for parallel machines, however, are that on irregular codes it doesn't matter how much time or effort you put into compilation, because at compile time you have less information than at runtime, and that's a critical disadvantage. At compile time you can't see the runtime data dependences, and your generated code must conservatively assume dependences exist. Handling it at runtime, hardware can see the actual dependences and that's what enables it to beat the "smart compiler, simple hardware" approach on a lot of real-world use cases.
I think routing actually gets easier once you get rid of the proprietary FPGA switching fabric, and go with a homogenous grid of identical elements.<p>I look forward to developments in this area, eventually I hope they come up with the 4x4bit Look Up Table fabric, with no fancy stuff, just lots of transistors mostly waiting, but far fewer than in a CPU.