科技回声

ghysznje3 个月前

(Probably bc I'm dumb) I'm very confused by this paper. The dimensions are all over the place: first they say M is a N x d x d matrix, then it becomes N x d. And then they are trying to scale M with g_out and add it to E_attn which is a T x d matrix??? Are the gates scalars or vectors or matrices? If they are matrices then the dimensions also don't line up to M

评论 #43045962 未加载

ziofill3 个月前

Missed opportunity to call them LMM and enjoy the onslaught of typos

评论 #43048388 未加载

评论 #43044394 未加载

评论 #43043222 未加载

评论 #43043334 未加载

评论 #43047738 未加载

评论 #43043173 未加载

kadushka3 个月前

The largest model they tested is 1.7B.

评论 #43043098 未加载

soganess3 个月前

Table 2's results are interesting. If the paper is to be believed, just adding the memory model seems to improve reasoning tasks across the board.<p>That said, I do wonder if this a bit of mirage. At 1.7B parameters, they are 3 orders of magnitude down from 4o (well that isn't completely fair, I don't know what the average 'expert' size is in 4o, but I doubt the authors are doing mixture of experts at only 1.7B). A model can 'memorize' way more shit with that many parameters.

igleria3 个月前

This immediately makes my mind bring up Hopfield networks <a href="https://arxiv.org/abs/2008.02217" rel="nofollow">https://arxiv.org/abs/2008.02217</a><p>when I worked with them circa 2012 they were practically toys. Maybe we are in a better place now?

ottaborra3 个月前

RNN with extra steps?

评论 #43050177 未加载

评论 #43045968 未加载

anentropic3 个月前

GitHub link in the paper is a 404 - private repo?

LM2: Large Memory Models

7 条评论

LM2: Large Memory Models

7 条评论