TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

30 点作者 lnyan大约 3 年前

2 条评论

lysecret大约 3 年前
It seems to me transformers are here to stay for a while (considering they have been invented in 2017 and there really havent been any fundamental adjustments to the architecture). It is quite exciting to me to think about all the possible optimizations and improvements that can happen if the underlying aritechture stays. I'm thinking about GPU optimizations like this maybe also integration with databases, libraries to simplify interacting and building on top of transformers reducing their sice, fast inference easy domain adjustment etc. etc. Feels like we are at the beginning of a transformer ecosystem.
评论 #31594862 未加载
impossiblefork大约 3 年前
To me this is something big, because of it&#x27;s success on Path-X.<p>It&#x27;s a bit surprising that just a longer range transformer architecture enabled by better use of the GPU was the solution though.