That is an amazing paper, a great result and new neutral architectures are long overdue.<p>But I don't believe that this has any significance in practice.<p>GPU memory is the limiting factor for most current AI approaches. And that's where the typical convolutional architectures shine, because they effectively compress the input data, then work on the compressed representation, then decompress the results. With gated linear networks, I'm required to always work on the full input data, because it's a one step prediction. As the result, I'll run out of GPU memory before I reach a learning capacity that is comparable to conv nets.
"We show that this architecture gives rise to universal learning capabilities in the limit, with effective model capacity increasing as a function of network size in a manner comparable with deep ReLU networks."<p>What exactly this statement means?
As a relative neophyte in this realm, this is fascniating to read. Comparing this to the the models/methods to derive said properties, is good education for me.