TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

210 pointsby limoce10 months ago

8 comments

chillee10 months ago
Hi, one of the authors of this blog post (Horace He), along with Driss Guessous, Yanbo Liang, and Joy Dong.<p>We’re quite happy with this abstraction - happy to answer any questions about it!
评论 #41193941 未加载
评论 #41201213 未加载
评论 #41192987 未加载
visarga10 months ago
It&#x27;s interesting that optimizing a computation that can be described in a single line of math takes so much work. It took forever even to discover Flash attention. And in the 6 years since transformers were invented, thousands of papers worked on making it faster.<p>Attention(Q,K,V) = Softmax(Q*K^T&#x2F;sqrt(d_k))*V<p>FlexAttention seems to have found the right abstraction for the task.
评论 #41192329 未加载
评论 #41192843 未加载
brrrrrm10 months ago
For most LLM workloads today (short text chats), hundreds or a couple thousand tokens suffice. attention mechanisms don’t dominate (&lt; 30% compute). But as the modalities inevitably grow, work in attention approximation&#x2F;compression is going to be paramount.<p>Nice to see Pytorch already elegantly supporting this next step in research
hi_hi10 months ago
I didn&#x27;t see any notice of this being CUDA only (like FlashAttention). I tried running on my Mac M3, python 3.11.8, following the quickstart (with the deviation of running it in a new venv). Got the following error:<p>&#x2F;attention-gym&#x2F;.venv&#x2F;lib&#x2F;python3.11&#x2F;site-packages&#x2F;torch&#x2F;_subclasses&#x2F;functional_tensor.py:258: UserWarning: Failed to initialize NumPy: No module named &#x27;numpy&#x27; (Triggered internally at &#x2F;Users&#x2F;runner&#x2F;work&#x2F;pytorch&#x2F;pytorch&#x2F;pytorch&#x2F;torch&#x2F;csrc&#x2F;utils&#x2F;tensor_numpy.cpp:84.) cpu = _conversion_method_template(device=torch.device(&quot;cpu&quot;)) Traceback (most recent call last): File &quot;&#x2F;attention-gym&#x2F;attn_gym&#x2F;masks&#x2F;document_mask.py&quot;, line 7, in &lt;module&gt; from torch.nn.attention.flex_attention import _mask_mod_signature ModuleNotFoundError: No module named &#x27;torch.nn.attention.flex_attention&#x27;
评论 #41197399 未加载
alecco10 months ago
&gt; FlexAttention achieves 90% of FlashAttention2’s performance in the forward pass and 85% in the backward pass.<p>It&#x27;s very good. But note FlashAttention-3 is 1.5x - 2x faster than FlashAttention-2.
评论 #41194485 未加载
gchamonlive10 months ago
Always had the curiosity to put something together with pytorch but it always seemed either a steep learning curve or there wasn&#x27;t a big motivator (project, problem to solve, something in my daily routine to optimize).<p>Does anybody have a good starting point to learn with hands-on projects and also that could accommodate for flexattention?
评论 #41191819 未加载
评论 #41192135 未加载
评论 #41195029 未加载
andy12_10 months ago
This is so cool. I want to try to implement something with this right now.
barrenko10 months ago
Can someone do a short summary or TL;DR for this?
评论 #41193309 未加载