科技回声

15 条评论

vessenes超过 1 年前

I wasn't sure if this paper was parody on reading the abstract. It's not parody. Two things stand out to me: first is the idea of distilling these networks down into a smaller latent space, and then mucking around with that. That's interesting, and cross-sections a bunch of interesting topics like interpretability, compression, training, over- and under-.. The second is that they show the diffusion models don't just converge on similar parameters as the ones they train against/diffuse into, and that's also interesting.<p>I confess I'm not sure what I'd do with this in the random grab bag of Deep Learning knowledge I have, but I think it's pretty fascinating. I might like to see a trained latent encoder that works well on a bunch of different neural networks; maybe that thing would be a good tool for interpreting / inspecting.

评论 #39462866 未加载

评论 #39463098 未加载

评论 #39484031 未加载

评论 #39496632 未加载

gwern超过 1 年前

This doesn't seem all that impressive when you compare it to earlier work like 'g.pt' <a href="https://arxiv.org/abs/2209.12892" rel="nofollow">https://arxiv.org/abs/2209.12892</a> Peebles et al 2022. They cite it in passing, but do no comparison or discussion, and to my eyes, g.pt is a lot more interesting (for example, you can prompt it for a variety of network properties like low vs high score, whereas this just generates unconditionally) and more thoroughly evaluated. The autoencoder here doesn't seem like it adds much.

vagabund超过 1 年前

Author thread: <a href="https://twitter.com/liuzhuang1234/status/1760195922502312197" rel="nofollow">https://twitter.com/liuzhuang1234/status/1760195922502312197</a>

评论 #39460343 未加载

falcor84超过 1 年前

Seems like we're getting very close to recursive self-improvement [0].<p>[0] <a href="https://www.lesswrong.com/tag/recursive-self-improvement" rel="nofollow">https://www.lesswrong.com/tag/recursive-self-improvement</a>

评论 #39460853 未加载

评论 #39460500 未加载

评论 #39460589 未加载

评论 #39464909 未加载

评论 #39461618 未加载

评论 #39460703 未加载

评论 #39460508 未加载

goggy_googy超过 1 年前

"We synthesize 100 novel parameters by feeding random noise into the latent diffusion model and the trained decoder." Cool that patterns exist at this level, but also, 100 params means we have a long way to go before this process is efficient enough to synthesize more modern-sized models.

Scene_Cast2超过 1 年前

Yay, an alternative to backprop & SGD! Really interesting and impressive finding, I was surprised that the network generalizes.

justanotherjoe超过 1 年前

fuck. I have an idea just like this one. I guess it's true that ideas are a dime a dozen. Diffusions bear a remarkable similarity to backpropagation to me. I thought that it could be used in place of it for some parts of a model.<p>Furthermore, I posit that resnet especially in transformers allows the model into a more exploratory behavior that is really powerful, and is a necessary component of the power of transformers. Transformers is just such a great architecture the more i think about it. It's doing so many things so right. Although this is not really related to the topic.

评论 #39464882 未加载

goggy_googy超过 1 年前

Important to note, they say "From these generated models, we select the one with the best performance on the training set." Definitely potential for bias here.

评论 #39461261 未加载

marojejian超过 1 年前

Am i missing something, or is this just a case of "amortized inference", where you train a model (here a diffusion one), to infer something that was previously found via optimization procedure? (here NN parameters).

jackblemming超过 1 年前

The state of art neural net architecture, whether that be transformers or the like, trained on self play to optimize non-differentiable but highly efficient architectures is the way.

评论 #39461352 未加载

hoc超过 1 年前

Hm, so does this actually improve/condense the representation for certain applications or is this some more some kind of global expand and collect in network space?

jarrell_mark超过 1 年前

Can this be used to fill in the missing information on the openworm nematode 302 neurons brain simulator?

amelius超过 1 年前

Why does Figure 7 not include a validation curve (afaict only the training curve is shown)?

nullc超过 1 年前

heh <a href="https://news.ycombinator.com/item?id=39208213#39211749">https://news.ycombinator.com/item?id=39208213#39211749</a>

评论 #39467833 未加载

t_serpico超过 1 年前

i'd wager that adding noise to the weights in a principled fashion would accomplish something similar to this.

评论 #39462067 未加载

15 条评论

vessenes超过 1 年前

评论 #39462866 未加载

评论 #39463098 未加载

评论 #39484031 未加载

评论 #39496632 未加载

gwern超过 1 年前

vagabund超过 1 年前

Author thread: <a href="https://twitter.com/liuzhuang1234/status/1760195922502312197" rel="nofollow">https://twitter.com/liuzhuang1234/status/1760195922502312197</a>

评论 #39460343 未加载

falcor84超过 1 年前

评论 #39460853 未加载

评论 #39460500 未加载

评论 #39460589 未加载

评论 #39464909 未加载

评论 #39461618 未加载

评论 #39460703 未加载

评论 #39460508 未加载

goggy_googy超过 1 年前

Scene_Cast2超过 1 年前

Yay, an alternative to backprop & SGD! Really interesting and impressive finding, I was surprised that the network generalizes.

justanotherjoe超过 1 年前

评论 #39464882 未加载

goggy_googy超过 1 年前

Important to note, they say "From these generated models, we select the one with the best performance on the training set." Definitely potential for bias here.

评论 #39461261 未加载

marojejian超过 1 年前

jackblemming超过 1 年前

The state of art neural net architecture, whether that be transformers or the like, trained on self play to optimize non-differentiable but highly efficient architectures is the way.

评论 #39461352 未加载

hoc超过 1 年前

Hm, so does this actually improve/condense the representation for certain applications or is this some more some kind of global expand and collect in network space?

jarrell_mark超过 1 年前

Can this be used to fill in the missing information on the openworm nematode 302 neurons brain simulator?

amelius超过 1 年前

Why does Figure 7 not include a validation curve (afaict only the training curve is shown)?

nullc超过 1 年前

heh <a href="https://news.ycombinator.com/item?id=39208213#39211749">https://news.ycombinator.com/item?id=39208213#39211749</a>

评论 #39467833 未加载

t_serpico超过 1 年前

i'd wager that adding noise to the weights in a principled fashion would accomplish something similar to this.

评论 #39462067 未加载

Neural Network Diffusion

15 条评论

Neural Network Diffusion

15 条评论