TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

210 点作者 limoce10 个月前

8 条评论

chillee10 个月前
Hi, one of the authors of this blog post (Horace He), along with Driss Guessous, Yanbo Liang, and Joy Dong.<p>We’re quite happy with this abstraction - happy to answer any questions about it!
评论 #41193941 未加载
评论 #41201213 未加载
评论 #41192987 未加载
visarga10 个月前
It&#x27;s interesting that optimizing a computation that can be described in a single line of math takes so much work. It took forever even to discover Flash attention. And in the 6 years since transformers were invented, thousands of papers worked on making it faster.<p>Attention(Q,K,V) = Softmax(Q*K^T&#x2F;sqrt(d_k))*V<p>FlexAttention seems to have found the right abstraction for the task.
评论 #41192329 未加载
评论 #41192843 未加载
brrrrrm10 个月前
For most LLM workloads today (short text chats), hundreds or a couple thousand tokens suffice. attention mechanisms don’t dominate (&lt; 30% compute). But as the modalities inevitably grow, work in attention approximation&#x2F;compression is going to be paramount.<p>Nice to see Pytorch already elegantly supporting this next step in research
hi_hi10 个月前
I didn&#x27;t see any notice of this being CUDA only (like FlashAttention). I tried running on my Mac M3, python 3.11.8, following the quickstart (with the deviation of running it in a new venv). Got the following error:<p>&#x2F;attention-gym&#x2F;.venv&#x2F;lib&#x2F;python3.11&#x2F;site-packages&#x2F;torch&#x2F;_subclasses&#x2F;functional_tensor.py:258: UserWarning: Failed to initialize NumPy: No module named &#x27;numpy&#x27; (Triggered internally at &#x2F;Users&#x2F;runner&#x2F;work&#x2F;pytorch&#x2F;pytorch&#x2F;pytorch&#x2F;torch&#x2F;csrc&#x2F;utils&#x2F;tensor_numpy.cpp:84.) cpu = _conversion_method_template(device=torch.device(&quot;cpu&quot;)) Traceback (most recent call last): File &quot;&#x2F;attention-gym&#x2F;attn_gym&#x2F;masks&#x2F;document_mask.py&quot;, line 7, in &lt;module&gt; from torch.nn.attention.flex_attention import _mask_mod_signature ModuleNotFoundError: No module named &#x27;torch.nn.attention.flex_attention&#x27;
评论 #41197399 未加载
alecco10 个月前
&gt; FlexAttention achieves 90% of FlashAttention2’s performance in the forward pass and 85% in the backward pass.<p>It&#x27;s very good. But note FlashAttention-3 is 1.5x - 2x faster than FlashAttention-2.
评论 #41194485 未加载
gchamonlive10 个月前
Always had the curiosity to put something together with pytorch but it always seemed either a steep learning curve or there wasn&#x27;t a big motivator (project, problem to solve, something in my daily routine to optimize).<p>Does anybody have a good starting point to learn with hands-on projects and also that could accommodate for flexattention?
评论 #41191819 未加载
评论 #41192135 未加载
评论 #41195029 未加载
andy12_10 个月前
This is so cool. I want to try to implement something with this right now.
barrenko10 个月前
Can someone do a short summary or TL;DR for this?
评论 #41193309 未加载