科技回声

It seems to me transformers are here to stay for a while (considering they have been invented in 2017 and there really havent been any fundamental adjustments to the architecture). It is quite exciting to me to think about all the possible optimizations and improvements that can happen if the underlying aritechture stays. I'm thinking about GPU optimizations like this maybe also integration with databases, libraries to simplify interacting and building on top of transformers reducing their sice, fast inference easy domain adjustment etc. etc. Feels like we are at the beginning of a transformer ecosystem.

To me this is something big, because of it's success on Path-X.<p>It's a bit surprising that just a longer range transformer architecture enabled by better use of the GPU was the solution though.

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

2 条评论

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

2 条评论