Reminds of some earlier openai work <a href="https://openai.com/research/nonlinear-computation-in-deep-linear-networks" rel="nofollow">https://openai.com/research/nonlinear-computation-in-deep-li...</a><p>"Neural networks consist of stacks of a linear layer followed by a nonlinearity like tanh or rectified linear unit. Without the nonlinearity, consecutive linear layers would be in theory mathematically equivalent to a single linear layer. So it’s a surprise that floating point arithmetic is nonlinear enough to yield trainable deep networks."
Tom7 is kind of like the electronic version of the Primitive Technology Youtuber.<p>It's fascinating watching someone using some of the worst tools ever to make something in about the most labor-intensive way imaginable - and it's just beautiful. It's practically meditative.
> I think this is a fractal in the sense that it is chaotic, has a color gradient, and could be on the cover of an electronic music album<p>I didn't know of this criterion
This is one of the few tom7 videos that I see and am utterly confused by in every way. It feels like he's speaking a different language than I am.
I believe that this floating point imprecision was something that David Page used (I think it may have been accidental as he was originally doing something else with it?) to achieve a world record training speed on CIFAR10 by summing a bunch of loss values together instead of taking the average. The sum effectively reduced the precision of the loss value and seemed to have a regularizing impact on network training as best as I personally understand. :) <3
Tom7 back to his usual madness. This time, exploiting floating point rounding, first to allow a neural network activation function to be linear (except for rounding errors), then expanding until ultimately making a 6502 emulator and proving that linear operations + rounding errors are Turing complete.