Another week, another paper that thinks they can revive recurrent networks. Although this time the father of LSTM is a co-author, so this paper should not come as a surprise. Sadly, the results seem to indicate that even by employing literally all tricks of the trade, their architecture can't beat the throughput of flash-attention (not by a long shot, but that is not surprising for recurrent designs) and, on top of that, it is even slower than Mamba, which offers similar accuracy at lower cost. So my money is on this being another DOA architecture, like all the others we've seen this year already.