Notes from quick read of paper at <a href="https://arxiv.org/abs/2302.10866" rel="nofollow">https://arxiv.org/abs/2302.10866</a>. Title of popsci is overreaching, this is a drop-in subquadratic replacement for attention. Could be promising, but to be seen if it is adopted in practice. skybrian (<a href="https://news.ycombinator.com/item?id=35657983" rel="nofollow">https://news.ycombinator.com/item?id=35657983</a>) points out
new blog post by authors, and prev discussion of older (march 28th) blog post. Takeaways:<p>* In standard attention in transformers, cost scales quadratically with length of sequence, which restricts model context. This work presents subquadratic exact operator allowing it to scale to larger contexts (100k+).<p>* They introduce an operator called "Hyena hierarchy", a recurrence over 2 subquadratic operations: long convolution, and element-wise mul gating. Sec 3.1-3.3 define the recurrences, matrices, and filters. This is importantly, a drop in replacement for attention.<p>* Longer context: 100x speedup over FlashAttention at 64k context (if we view flash attention as an non-approx engg optimization, then this work is improving algorithmically, and getting OOM over that). Associate recall, i.e., just pull data, show improvements: Experiments on 137k context, and vocab sizes of 10-40 (unsure why they have bad recall on small length sequence with larger vocab, but they still outperform others)<p>* Comparisons (on relatively small models, but hoping to show pattern) with RWKV (attention-free model, trained on 332B tokens), GPTNeo (trained on 300B tokens), with Hyena trained on 137B tokens. Models are 125M-355M sized. (Section 4.3)<p>* On SuperGLUE, zero-shot and 3-shot accuracy is ballpark similar to GPTNeo (although technically they underperform a bit for zero-shot and overperform a bit for 3-shot). (Table 4.5 and 4.6)<p>* Because they can support large (e.g., 100k+) context, they can do image classification. They report ballpark comparable against others. (Table 4.7)<p>Might have misread some takeaways; happy to be corrected.