This is (and was) the dream of Cerebras and I am very glad to see it embraced if even in small part on a GPU. Wild to see how much performance is left on the table for these things, it's crazy to think how much can be done by a few bold individuals when it comes to pushing the SOTA of these kinds of things (not just in kernels either -- in other areas as well!)<p>My experience has been that getting over the daunting factor of feeling afraid of a big wide world with a lot of noise and marketing and simply committing to a problem, learning it, and slowly bootstrapping it over time, tends to yield phenomenal results in the long run for most applications. And, if not, then there's often an applicable one/side field that can be pivoted to for still making immense/incredible progress.<p>The big players may have the advantage of scale, but there is so, so much that can be done still if you look around and keep a good feel for it. <3 :)