TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A modern self-referential weight matrix that learns to modify itself

166 点作者 lnyan大约 3 年前

9 条评论

heyitsguay大约 3 年前
I know Schmidhuber is famously miffed for missing out on the AI revolution limelight, and despite that he runs a pretty famous and well-resourced group. So with a paper like this demonstrating a new fundamental technique, you&#x27;d think they would eat the labor and compute costs of getting this up and running on a full gauntlet of high-profile benchmarks, in comparison with existing SOTA methods, vs the sort of half-hearted benchmarking that happens in this paper. It&#x27;s a hassle, but all it would take for something like this to catch the community&#x27;s attention would be a clear demonstration of viability in line with what groups at any of the other large research institutions do.<p>The failure to put something like that front and center makes me wonder how strong the method is, because you have to assume that someone on the team has tried more benchmarks. Still, the idea of learning a better update rule than gradient descent is intriguing, so maybe something cool will come from this :)
评论 #31021210 未加载
评论 #31021573 未加载
评论 #31051037 未加载
ricardobayes大约 3 年前
It&#x27;s a super weird feeling to click on a hacker news top post and find out I know one of the authors. The world is a super small place.
评论 #31022786 未加载
goodmattg大约 3 年前
Need time to digest this paper, but you can assume if it&#x27;s from Schmidhuber&#x27;s group it will have some impact, even if only intellectual.
TekMol大约 3 年前
I have been playing with alternative ways to do machine learning on and off for a few years now. Some experiments went very well.<p>I am never sure if it is a waste of time or has some value.<p>If you guys had some unique ML technology that is different to what all the others do, what would you do with it?
评论 #31019003 未加载
评论 #31017913 未加载
评论 #31019294 未加载
评论 #31021521 未加载
评论 #31018428 未加载
评论 #31017955 未加载
评论 #31018070 未加载
评论 #31018330 未加载
评论 #31018373 未加载
评论 #31022712 未加载
评论 #31024115 未加载
评论 #31018473 未加载
mark_l_watson大约 3 年前
I haven&#x27;t really absorbed this paper yet, but first thoughts were Hopfield Networks we used in the 1980s.<p>For unsupervised learning algorithms like masked models (BERT and some other Transformers), it makes sense to train in parallel with prediction. Why not?<p>My imagination can&#x27;t wrap around using this for supervised (labeled data) learning.
codelord大约 3 年前
I haven&#x27;t read the paper yet, no comment on the content. But it&#x27;s amusing that more than 30% of references are self-citation.
评论 #31020529 未加载
评论 #31021536 未加载
savant_penguin大约 3 年前
Just skimmed the paper but the benchmarks are super weird
jdeaton大约 3 年前
I&#x27;m having a hard time reading this paper without hearing you-again&#x27;s voice in my head.
nh23423fefe大约 3 年前
It&#x27;s only a matter of time until the technological singularity<p>&gt; The WM of a self-referential NN, however, can keep rapidly modifying all of itself during runtime. In principle, such NNs can meta-learn to learn, and metameta-learn to meta-learn to learn, and so on, in the sense of recursive self-improvement.<p>Everyone who doubts is hanging everything on &quot;in principle&quot; being too hard. Seems ridiculous to me, a failure of imagination.
评论 #31018698 未加载
评论 #31019583 未加载
评论 #31019072 未加载
评论 #31020292 未加载
评论 #31019673 未加载
评论 #31019505 未加载
评论 #31019860 未加载