TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Coding Self-Attention, Multi-Head Attention, Cross-Attention, Causal-Attention

142 点作者 rasbt超过 1 年前

2 条评论

f38zf5vdt超过 1 年前
As mentioned, these are all toy implementations and you should not use them in production. If you want to the fast, easy, and extremely optimized way of doing things, use torch.nn.MultiheadAttention or torch.nn.functional.scaled_dot_product_attention so that you get the optimal implementations. You can use xformers scaled dot product attention if you want the bleeding edge of performance.<p>&gt; (Note that the code presented in this article is intended for illustrative purposes. If you plan to implement self-attention for training LLMs, I recommend considering optimized implementations like Flash Attention, which reduce memory footprint and computational load.)<p>Flash attention is already part of torch&#x27;s kernels as of torch 2, but the latest versions and optimizations land in xformers first.
评论 #38993110 未加载
评论 #38994476 未加载
评论 #39031106 未加载
评论 #38993040 未加载
atticora超过 1 年前
<p><pre><code> conscious, kŏn′shəs, adjective -- Characterized by or having an awareness of one&#x27;s environment and one&#x27;s own existence, sensations, and thoughts. synonym: aware. </code></pre> Self-attention seems to be at least a proxy for &quot;awareness of ... one&#x27;s own existence.&quot; If that closed loop is the thing that converts sensibility into sentience, then maybe it&#x27;s the source of LLM&#x27;s leverage too. Is this language comprehension algorithm a sort of consciousness algorithm?
评论 #38992106 未加载
评论 #38993763 未加载
评论 #38991860 未加载
评论 #38992610 未加载