TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Denoising Diffusion models from first principle in Julia

109 点作者 the_origami_fox超过 2 年前

7 条评论

astrange超过 2 年前
This claims to explain diffusion models from first principles, but the issue with explaining how they work is we don&#x27;t know how they work.<p>The explanation in the original paper turns out not to be true; you can get rid of most of their assumptions and it still works: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2208.09392" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2208.09392</a>
评论 #34288617 未加载
评论 #34286393 未加载
评论 #34286985 未加载
adgjlsfhk1超过 2 年前
This is really cool, and I think possibly the best demonstration I&#x27;ve seen in a while of the power of Julia for tasks outside of science. The ability to use a high level fast and flexible language instead of being forced to compromise on a slow high level language that wraps fast but rigid libraries allows you to do some really cool stuff.
mk_stjames超过 2 年前
Just in case the author is reading: in the introduction, you use the term &quot;virtual RAM&quot; to describe the GPU VRAM needed for running stable diffusion, but what that VRAM actually stands for is &#x27;Video RAM&#x27;.
评论 #34296912 未加载
adammarples超过 2 年前
I get the error trying to run MNIST(:train) that modules are not callable. When I search for the error I see a link to the repo issues raising this issue but the issue itself is missing. Was it deleted rather than closed? Weird.
radarsat1超过 2 年前
I&#x27;ve been trying out developing my own diffusion models from scratch lately, to understand this approach better and to compare against similar trials I previously did with GAN. My impression from reading posts like these was that it would be relatively easy once you understand it.. with the advantage that you get a nice normal supervised MSE target to train against, instead of having to deal with the instabilities of GANs.<p>I have found in practice that they do not deliver on this front. The loss curve you get is often just a big thick noisy straight line, completely devoid of information about whether it&#x27;s converging. And the convergence seems to be greatly dependent on model choices and the beta schedule you choose. It&#x27;s not clear to me at all how to choose those things in a principled manner. Until you train for a <i>long</i> time, you just basically get noise, so it&#x27;s hard to know when to restart an experiment or keep going. Do I need 10 steps, 100, 1000? I found that training even longer, and longer, and longer, it does get better and better, very slowly, even though this is not displayed in the loss curve, and there seems to be no indication of when the model has &quot;converged&quot; in any meaningful sense. My understanding of why this is the case is that due to the integrative nature of the sampling process, even tiny errors in approximating the noise add up to large divergences.<p>I&#x27;ve also tried making it conditional on vector quantization codes, and it seems to fail to use them nearly as well as VQGAN does. At least I haven&#x27;t had much success doing it directly in the diffusion model. After reading more into it, I found that most diffusion-based models actually use a conditional GAN to develop a latent space and a decoder, and the diffusion model is used to generate samples in the latent space. This strikes me that the diffusion model then can never actually do better than the associated GAN&#x27;s decoder, which surprised me to realize since it&#x27;s usually proposed as an <i>alternative</i> to GAN.<p>So, overall I&#x27;m failing to grasp the advantages this approach really has over just using a GAN. Obviously it works fantastically for these large scale generative projects, but I don&#x27;t understand why it&#x27;s better, to be honest, despite having read every article out there telling me again and again the same things about how it works. E.g. DALLE-1 used VQGAN, not diffusion, and people were pretty wowed by it. I&#x27;m not sure why DALLE-2&#x27;s improvements can be attributed to their change to a diffusion process, if they are still using a GAN to decode the output.<p>Looking for some intuition, if anyone can offer some. I understand that the nature of how it iteratively improves the image allows it to deduce large-scale and small-scale features progressively, but it seems to me that the many upscaling layers of a large GAN can do the same thing.
评论 #34296903 未加载
评论 #34289947 未加载
评论 #34287598 未加载
KRAKRISMOTT超过 2 年前
Are there any hosted Julia ML services like Grid.ai for Python?
评论 #34289229 未加载
IlyaOrson超过 2 年前
Awesome post!