TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Theoretical limitations of multi-layer Transformer

107 点作者 fovc3 个月前

5 条评论

thesz3 个月前
<p><pre><code> &gt; ...our results give: ... (3) a provable advantage of chain-of-thought, exhibiting a task that becomes exponentially easier with chain-of-thought. </code></pre> It would be good to also prove that there is no task that becomes exponentially harder with chain-of-thought.
cubefox3 个月前
Loosely related thought: A year ago, there was a lot of talk about the Mamba SSM architecture replacing transformers. Apparently that didn&#x27;t happen so far.
评论 #42896893 未加载
hochstenbach3 个月前
Quanta magazine has an article that explains in plain words what the researchers were trying to do : <a href="https:&#x2F;&#x2F;www.quantamagazine.org&#x2F;chatbot-software-begins-to-face-fundamental-limitations-20250131&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.quantamagazine.org&#x2F;chatbot-software-begins-to-fa...</a>
byyoung33 个月前
those lemmas are wild
cs7023 个月前
Huh. I just skimmed this and quickly concluded that it&#x27;s definitely <i>not</i> light reading.<p>It sure looks and smells like good work, so I&#x27;ve added it to my reading list.<p>Nowadays I feel like my reading list is growing faster than I can go through it.
评论 #42891757 未加载