That is an amazing paper, a great result and new neutral architectures are long overdue.<p>But I don't believe that this has any significance in practice.<p>GPU memory is the limiting factor for most current AI approaches. And that's where the typical convolutional architectures shine, because they effectively compress the input data, then work on the compressed representation, then decompress the results. With gated linear networks, I'm required to always work on the full input data, because it's a one step prediction. As the result, I'll run out of GPU memory before I reach a learning capacity that is comparable to conv nets.