科技回声

2 条评论

f38zf5vdt超过 1 年前

As mentioned, these are all toy implementations and you should not use them in production. If you want to the fast, easy, and extremely optimized way of doing things, use torch.nn.MultiheadAttention or torch.nn.functional.scaled_dot_product_attention so that you get the optimal implementations. You can use xformers scaled dot product attention if you want the bleeding edge of performance.<p>> (Note that the code presented in this article is intended for illustrative purposes. If you plan to implement self-attention for training LLMs, I recommend considering optimized implementations like Flash Attention, which reduce memory footprint and computational load.)<p>Flash attention is already part of torch's kernels as of torch 2, but the latest versions and optimizations land in xformers first.

评论 #38993110 未加载

评论 #38994476 未加载

评论 #39031106 未加载

评论 #38993040 未加载

atticora超过 1 年前

<p><pre><code> conscious, kŏn′shəs, adjective -- Characterized by or having an awareness of one's environment and one's own existence, sensations, and thoughts. synonym: aware. </code></pre> Self-attention seems to be at least a proxy for "awareness of ... one's own existence." If that closed loop is the thing that converts sensibility into sentience, then maybe it's the source of LLM's leverage too. Is this language comprehension algorithm a sort of consciousness algorithm?

评论 #38992106 未加载

评论 #38993763 未加载

评论 #38991860 未加载

评论 #38992610 未加载

Coding Self-Attention, Multi-Head Attention, Cross-Attention, Causal-Attention

2 条评论

Coding Self-Attention, Multi-Head Attention, Cross-Attention, Causal-Attention

2 条评论