TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Neural Network Diffusion

223 pointsby vagabundabout 1 year ago

15 comments

vessenesabout 1 year ago
I wasn&#x27;t sure if this paper was parody on reading the abstract. It&#x27;s not parody. Two things stand out to me: first is the idea of distilling these networks down into a smaller latent space, and then mucking around with that. That&#x27;s interesting, and cross-sections a bunch of interesting topics like interpretability, compression, training, over- and under-.. The second is that they show the diffusion models don&#x27;t just converge on similar parameters as the ones they train against&#x2F;diffuse into, and that&#x27;s also interesting.<p>I confess I&#x27;m not sure what I&#x27;d do with this in the random grab bag of Deep Learning knowledge I have, but I think it&#x27;s pretty fascinating. I might like to see a trained latent encoder that works well on a bunch of different neural networks; maybe that thing would be a good tool for interpreting &#x2F; inspecting.
评论 #39462866 未加载
评论 #39463098 未加载
评论 #39484031 未加载
评论 #39496632 未加载
gwernabout 1 year ago
This doesn&#x27;t seem all that impressive when you compare it to earlier work like &#x27;g.pt&#x27; <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2209.12892" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2209.12892</a> Peebles et al 2022. They cite it in passing, but do no comparison or discussion, and to my eyes, g.pt is a lot more interesting (for example, you can prompt it for a variety of network properties like low vs high score, whereas this just generates unconditionally) and more thoroughly evaluated. The autoencoder here doesn&#x27;t seem like it adds much.
vagabundabout 1 year ago
Author thread: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;liuzhuang1234&#x2F;status&#x2F;1760195922502312197" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;liuzhuang1234&#x2F;status&#x2F;1760195922502312197</a>
评论 #39460343 未加载
falcor84about 1 year ago
Seems like we&#x27;re getting very close to recursive self-improvement [0].<p>[0] <a href="https:&#x2F;&#x2F;www.lesswrong.com&#x2F;tag&#x2F;recursive-self-improvement" rel="nofollow">https:&#x2F;&#x2F;www.lesswrong.com&#x2F;tag&#x2F;recursive-self-improvement</a>
评论 #39460853 未加载
评论 #39460500 未加载
评论 #39460589 未加载
评论 #39464909 未加载
评论 #39461618 未加载
评论 #39460703 未加载
评论 #39460508 未加载
goggy_googyabout 1 year ago
&quot;We synthesize 100 novel parameters by feeding random noise into the latent diffusion model and the trained decoder.&quot; Cool that patterns exist at this level, but also, 100 params means we have a long way to go before this process is efficient enough to synthesize more modern-sized models.
Scene_Cast2about 1 year ago
Yay, an alternative to backprop &amp; SGD! Really interesting and impressive finding, I was surprised that the network generalizes.
justanotherjoeabout 1 year ago
fuck. I have an idea just like this one. I guess it&#x27;s true that ideas are a dime a dozen. Diffusions bear a remarkable similarity to backpropagation to me. I thought that it could be used in place of it for some parts of a model.<p>Furthermore, I posit that resnet especially in transformers allows the model into a more exploratory behavior that is really powerful, and is a necessary component of the power of transformers. Transformers is just such a great architecture the more i think about it. It&#x27;s doing so many things so right. Although this is not really related to the topic.
评论 #39464882 未加载
goggy_googyabout 1 year ago
Important to note, they say &quot;From these generated models, we select the one with the best performance on the training set.&quot; Definitely potential for bias here.
评论 #39461261 未加载
marojejianabout 1 year ago
Am i missing something, or is this just a case of &quot;amortized inference&quot;, where you train a model (here a diffusion one), to infer something that was previously found via optimization procedure? (here NN parameters).
jackblemmingabout 1 year ago
The state of art neural net architecture, whether that be transformers or the like, trained on self play to optimize non-differentiable but highly efficient architectures is the way.
评论 #39461352 未加载
hocabout 1 year ago
Hm, so does this actually improve&#x2F;condense the representation for certain applications or is this some more some kind of global expand and collect in network space?
jarrell_markabout 1 year ago
Can this be used to fill in the missing information on the openworm nematode 302 neurons brain simulator?
ameliusabout 1 year ago
Why does Figure 7 not include a validation curve (afaict only the training curve is shown)?
nullcabout 1 year ago
heh <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39208213#39211749">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39208213#39211749</a>
评论 #39467833 未加载
t_serpicoabout 1 year ago
i&#x27;d wager that adding noise to the weights in a principled fashion would accomplish something similar to this.
评论 #39462067 未加载