This is interesting, thanks. I know this is just a demo, but I'm convinced we're going to see a swing back to simple, purpose written NNs for many simple applications, when the alternative is bringing in a couple GB of python and cuda libraries, which is serious overkill for something on the scale of MNIST (which many real problems are).<p>That said, I'm curious about how well the compiler can optimize matrix operations, in Zig or other, say C or Rust, and when it's worth linking in BLAS or mkl some other library. I wonder if there is a sweet spot where it's worth doing.