Here's a nice video by Yannick Kilcher explaning the Nystromformer: <a href="https://www.youtube.com/watch?v=m-zrcmRd7E4" rel="nofollow">https://www.youtube.com/watch?v=m-zrcmRd7E4</a><p>The benefits over regular transformers is that it is more efficient (does less operations), as the original transformer has a quadratic complexity in the number of input tokens.