TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Implementation of Google's Griffin Architecture – RNN LLM

218 点作者 milliondreams大约 1 年前

4 条评论

VHRanger大约 1 年前
Like RWKV and Mamba, this is mixing some RNN properties to avoid the issues transformers have.<p>However I&#x27;m curious about their scaling claims. They have a plot that shows how the model scales in training with the FLOPs you throw at it.<p>But the issue we should rather be concerned with is the wall time of training for a set amount of hardware.<p>Back in 2018, we could train medium sized RNNs, the issue was with wall time of training and training stability.
评论 #39994871 未加载
评论 #39994916 未加载
评论 #39995050 未加载
riku_iki大约 1 年前
I didn&#x27;t get one detail: they selected 6B transformer as baseline and compared it to 7B Griffin<p>Why wouldn&#x27;t select equal size models?..
评论 #39994681 未加载
janwas大约 1 年前
For anyone interested in a C++ implementation, our github.com&#x2F;google&#x2F;gemma.cpp now supports this model.
评论 #39999864 未加载
spxneo大约 1 年前
im not smart enough to know the significance of this...is Griffin like MAMBA?
评论 #39996525 未加载