A alternative to this has recently been published at ICML that claims to be faster. The website and tutorial video are very nice, too.<p><a href="https://linear-transformers.com/" rel="nofollow">https://linear-transformers.com/</a>
What is the performance of reformer or linformer or any of these other new models in practical applications (not the benchmarks that researchers game)? Is it better than BERT?