TechEcho

15 comments

vessenesabout 1 year ago

I wasn't sure if this paper was parody on reading the abstract. It's not parody. Two things stand out to me: first is the idea of distilling these networks down into a smaller latent space, and then mucking around with that. That's interesting, and cross-sections a bunch of interesting topics like interpretability, compression, training, over- and under-.. The second is that they show the diffusion models don't just converge on similar parameters as the ones they train against/diffuse into, and that's also interesting.<p>I confess I'm not sure what I'd do with this in the random grab bag of Deep Learning knowledge I have, but I think it's pretty fascinating. I might like to see a trained latent encoder that works well on a bunch of different neural networks; maybe that thing would be a good tool for interpreting / inspecting.

评论 #39462866 未加载

评论 #39463098 未加载

评论 #39484031 未加载

评论 #39496632 未加载

gwernabout 1 year ago

This doesn't seem all that impressive when you compare it to earlier work like 'g.pt' <a href="https://arxiv.org/abs/2209.12892" rel="nofollow">https://arxiv.org/abs/2209.12892</a> Peebles et al 2022. They cite it in passing, but do no comparison or discussion, and to my eyes, g.pt is a lot more interesting (for example, you can prompt it for a variety of network properties like low vs high score, whereas this just generates unconditionally) and more thoroughly evaluated. The autoencoder here doesn't seem like it adds much.

vagabundabout 1 year ago

Author thread: <a href="https://twitter.com/liuzhuang1234/status/1760195922502312197" rel="nofollow">https://twitter.com/liuzhuang1234/status/1760195922502312197</a>

评论 #39460343 未加载

falcor84about 1 year ago

Seems like we're getting very close to recursive self-improvement [0].<p>[0] <a href="https://www.lesswrong.com/tag/recursive-self-improvement" rel="nofollow">https://www.lesswrong.com/tag/recursive-self-improvement</a>

评论 #39460853 未加载

评论 #39460500 未加载

评论 #39460589 未加载

评论 #39464909 未加载

评论 #39461618 未加载

评论 #39460703 未加载

评论 #39460508 未加载

goggy_googyabout 1 year ago

"We synthesize 100 novel parameters by feeding random noise into the latent diffusion model and the trained decoder." Cool that patterns exist at this level, but also, 100 params means we have a long way to go before this process is efficient enough to synthesize more modern-sized models.

Scene_Cast2about 1 year ago

Yay, an alternative to backprop & SGD! Really interesting and impressive finding, I was surprised that the network generalizes.

justanotherjoeabout 1 year ago

fuck. I have an idea just like this one. I guess it's true that ideas are a dime a dozen. Diffusions bear a remarkable similarity to backpropagation to me. I thought that it could be used in place of it for some parts of a model.<p>Furthermore, I posit that resnet especially in transformers allows the model into a more exploratory behavior that is really powerful, and is a necessary component of the power of transformers. Transformers is just such a great architecture the more i think about it. It's doing so many things so right. Although this is not really related to the topic.

评论 #39464882 未加载

goggy_googyabout 1 year ago

Important to note, they say "From these generated models, we select the one with the best performance on the training set." Definitely potential for bias here.

评论 #39461261 未加载

marojejianabout 1 year ago

Am i missing something, or is this just a case of "amortized inference", where you train a model (here a diffusion one), to infer something that was previously found via optimization procedure? (here NN parameters).

jackblemmingabout 1 year ago

The state of art neural net architecture, whether that be transformers or the like, trained on self play to optimize non-differentiable but highly efficient architectures is the way.

评论 #39461352 未加载

hocabout 1 year ago

Hm, so does this actually improve/condense the representation for certain applications or is this some more some kind of global expand and collect in network space?

jarrell_markabout 1 year ago

Can this be used to fill in the missing information on the openworm nematode 302 neurons brain simulator?

ameliusabout 1 year ago

Why does Figure 7 not include a validation curve (afaict only the training curve is shown)?

nullcabout 1 year ago

heh <a href="https://news.ycombinator.com/item?id=39208213#39211749">https://news.ycombinator.com/item?id=39208213#39211749</a>

评论 #39467833 未加载

t_serpicoabout 1 year ago

i'd wager that adding noise to the weights in a principled fashion would accomplish something similar to this.

评论 #39462067 未加载

15 comments

vessenesabout 1 year ago

评论 #39462866 未加载

评论 #39463098 未加载

评论 #39484031 未加载

评论 #39496632 未加载

gwernabout 1 year ago

vagabundabout 1 year ago

Author thread: <a href="https://twitter.com/liuzhuang1234/status/1760195922502312197" rel="nofollow">https://twitter.com/liuzhuang1234/status/1760195922502312197</a>

评论 #39460343 未加载

falcor84about 1 year ago

评论 #39460853 未加载

评论 #39460500 未加载

评论 #39460589 未加载

评论 #39464909 未加载

评论 #39461618 未加载

评论 #39460703 未加载

评论 #39460508 未加载

goggy_googyabout 1 year ago

Scene_Cast2about 1 year ago

Yay, an alternative to backprop & SGD! Really interesting and impressive finding, I was surprised that the network generalizes.

justanotherjoeabout 1 year ago

评论 #39464882 未加载

goggy_googyabout 1 year ago

Important to note, they say "From these generated models, we select the one with the best performance on the training set." Definitely potential for bias here.

评论 #39461261 未加载

marojejianabout 1 year ago

jackblemmingabout 1 year ago

The state of art neural net architecture, whether that be transformers or the like, trained on self play to optimize non-differentiable but highly efficient architectures is the way.

评论 #39461352 未加载

hocabout 1 year ago

Hm, so does this actually improve/condense the representation for certain applications or is this some more some kind of global expand and collect in network space?

jarrell_markabout 1 year ago

Can this be used to fill in the missing information on the openworm nematode 302 neurons brain simulator?

ameliusabout 1 year ago

Why does Figure 7 not include a validation curve (afaict only the training curve is shown)?

nullcabout 1 year ago

heh <a href="https://news.ycombinator.com/item?id=39208213#39211749">https://news.ycombinator.com/item?id=39208213#39211749</a>

评论 #39467833 未加载

t_serpicoabout 1 year ago

i'd wager that adding noise to the weights in a principled fashion would accomplish something similar to this.

评论 #39462067 未加载

Neural Network Diffusion

15 comments

Neural Network Diffusion

15 comments