TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

xLSTM code release by NX-AI

123 点作者 badlogic12 个月前

7 条评论

ein0p12 个月前
Note: GNU AGPLv3. Industry labs won’t touch this with a hundred foot pole. Given that they’re the only ones with access to serious resources, it could be a while before we see a large model of this architecture
评论 #40590921 未加载
评论 #40586755 未加载
评论 #40587391 未加载
htrp12 个月前
This is exciting because it is an architecture that had so much promise, but we could never solve the gradient&#x2F;parallelization problems better than transformers.<p>This code will allow people yo experiment and see if it is a viable architecture at foundation&#x2F;frontier model scale.
dang12 个月前
Recent and related:<p><i>xLSTM: Extended Long Short-Term Memory</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40294650">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40294650</a> - May 2024 (73 comments)
pietz12 个月前
Could someone provide a quick summary where they stand compared to transformer architectures? Do they have real world scale results that are competitive?
评论 #40586835 未加载
评论 #40587540 未加载
trextrex12 个月前
I&#x27;m not clear on what advantage this architecture has over mamba&#x2F;Griffin. They also have the linear scaling, better sequence parallelism and are competitive in performance with transformers.
评论 #40589599 未加载
评论 #40588798 未加载
ganzuul12 个月前
Are there any studies on predicting neural architecture scaling? E.g. a small training dataset which indicates performance on a large training dataset?
brcmthrowaway12 个月前
Congrats to the x.AI team!
评论 #40593530 未加载