TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Neural Network Diffusion

223 点作者 vagabund超过 1 年前

15 条评论

vessenes超过 1 年前
I wasn&#x27;t sure if this paper was parody on reading the abstract. It&#x27;s not parody. Two things stand out to me: first is the idea of distilling these networks down into a smaller latent space, and then mucking around with that. That&#x27;s interesting, and cross-sections a bunch of interesting topics like interpretability, compression, training, over- and under-.. The second is that they show the diffusion models don&#x27;t just converge on similar parameters as the ones they train against&#x2F;diffuse into, and that&#x27;s also interesting.<p>I confess I&#x27;m not sure what I&#x27;d do with this in the random grab bag of Deep Learning knowledge I have, but I think it&#x27;s pretty fascinating. I might like to see a trained latent encoder that works well on a bunch of different neural networks; maybe that thing would be a good tool for interpreting &#x2F; inspecting.
评论 #39462866 未加载
评论 #39463098 未加载
评论 #39484031 未加载
评论 #39496632 未加载
gwern超过 1 年前
This doesn&#x27;t seem all that impressive when you compare it to earlier work like &#x27;g.pt&#x27; <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2209.12892" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2209.12892</a> Peebles et al 2022. They cite it in passing, but do no comparison or discussion, and to my eyes, g.pt is a lot more interesting (for example, you can prompt it for a variety of network properties like low vs high score, whereas this just generates unconditionally) and more thoroughly evaluated. The autoencoder here doesn&#x27;t seem like it adds much.
vagabund超过 1 年前
Author thread: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;liuzhuang1234&#x2F;status&#x2F;1760195922502312197" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;liuzhuang1234&#x2F;status&#x2F;1760195922502312197</a>
评论 #39460343 未加载
falcor84超过 1 年前
Seems like we&#x27;re getting very close to recursive self-improvement [0].<p>[0] <a href="https:&#x2F;&#x2F;www.lesswrong.com&#x2F;tag&#x2F;recursive-self-improvement" rel="nofollow">https:&#x2F;&#x2F;www.lesswrong.com&#x2F;tag&#x2F;recursive-self-improvement</a>
评论 #39460853 未加载
评论 #39460500 未加载
评论 #39460589 未加载
评论 #39464909 未加载
评论 #39461618 未加载
评论 #39460703 未加载
评论 #39460508 未加载
goggy_googy超过 1 年前
&quot;We synthesize 100 novel parameters by feeding random noise into the latent diffusion model and the trained decoder.&quot; Cool that patterns exist at this level, but also, 100 params means we have a long way to go before this process is efficient enough to synthesize more modern-sized models.
Scene_Cast2超过 1 年前
Yay, an alternative to backprop &amp; SGD! Really interesting and impressive finding, I was surprised that the network generalizes.
justanotherjoe超过 1 年前
fuck. I have an idea just like this one. I guess it&#x27;s true that ideas are a dime a dozen. Diffusions bear a remarkable similarity to backpropagation to me. I thought that it could be used in place of it for some parts of a model.<p>Furthermore, I posit that resnet especially in transformers allows the model into a more exploratory behavior that is really powerful, and is a necessary component of the power of transformers. Transformers is just such a great architecture the more i think about it. It&#x27;s doing so many things so right. Although this is not really related to the topic.
评论 #39464882 未加载
goggy_googy超过 1 年前
Important to note, they say &quot;From these generated models, we select the one with the best performance on the training set.&quot; Definitely potential for bias here.
评论 #39461261 未加载
marojejian超过 1 年前
Am i missing something, or is this just a case of &quot;amortized inference&quot;, where you train a model (here a diffusion one), to infer something that was previously found via optimization procedure? (here NN parameters).
jackblemming超过 1 年前
The state of art neural net architecture, whether that be transformers or the like, trained on self play to optimize non-differentiable but highly efficient architectures is the way.
评论 #39461352 未加载
hoc超过 1 年前
Hm, so does this actually improve&#x2F;condense the representation for certain applications or is this some more some kind of global expand and collect in network space?
jarrell_mark超过 1 年前
Can this be used to fill in the missing information on the openworm nematode 302 neurons brain simulator?
amelius超过 1 年前
Why does Figure 7 not include a validation curve (afaict only the training curve is shown)?
nullc超过 1 年前
heh <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39208213#39211749">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39208213#39211749</a>
评论 #39467833 未加载
t_serpico超过 1 年前
i&#x27;d wager that adding noise to the weights in a principled fashion would accomplish something similar to this.
评论 #39462067 未加载