TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

LM2: Large Memory Models

110 点作者 fzliu3 个月前

7 条评论

ghysznje3 个月前
(Probably bc I'm dumb) I'm very confused by this paper. The dimensions are all over the place: first they say M is a N x d x d matrix, then it becomes N x d. And then they are trying to scale M with g_out and add it to E_attn which is a T x d matrix??? Are the gates scalars or vectors or matrices? If they are matrices then the dimensions also don't line up to M
评论 #43045962 未加载
ziofill3 个月前
Missed opportunity to call them LMM and enjoy the onslaught of typos
评论 #43048388 未加载
评论 #43044394 未加载
评论 #43043222 未加载
评论 #43043334 未加载
评论 #43047738 未加载
评论 #43043173 未加载
kadushka3 个月前
The largest model they tested is 1.7B.
评论 #43043098 未加载
soganess3 个月前
Table 2&#x27;s results are interesting. If the paper is to be believed, just adding the memory model seems to improve reasoning tasks across the board.<p>That said, I do wonder if this a bit of mirage. At 1.7B parameters, they are 3 orders of magnitude down from 4o (well that isn&#x27;t completely fair, I don&#x27;t know what the average &#x27;expert&#x27; size is in 4o, but I doubt the authors are doing mixture of experts at only 1.7B). A model can &#x27;memorize&#x27; way more shit with that many parameters.
igleria3 个月前
This immediately makes my mind bring up Hopfield networks <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2008.02217" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2008.02217</a><p>when I worked with them circa 2012 they were practically toys. Maybe we are in a better place now?
ottaborra3 个月前
RNN with extra steps?
评论 #43050177 未加载
评论 #43045968 未加载
anentropic3 个月前
GitHub link in the paper is a 404 - private repo?