TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Why I find diffusion models interesting?

202 点作者 whoami_nr2 个月前

20 条评论

mountainriver2 个月前
The most interesting thing about diffusion LMs that tends to be missed, are their ability to edit early tokens.<p>We know that the early tokens in an autoregressive sequence disproportionately bias the outcome. I would go as far as to say this is some of the magic of reasoning models is they generate so much text they can kinda get around this.<p>However, diffusion seems like a much better way to solve this problem.
评论 #43290437 未加载
评论 #43288400 未加载
vinkelhake2 个月前
I don&#x27;t get where the author is coming from with the idea that a diffusion based LLM would hallucinate less.<p>&gt; dLLMs can generate certain important portions first, validate it, and then continue the rest of the generation.<p>If you pause the animation in the linked tweet (not the one on the page), you can see that the intermediate versions are full of, well, baloney.<p>(and anyone who has messed around with diffusion based image generation knows the models are perfectly happy to hallucinate).
评论 #43286458 未加载
评论 #43287562 未加载
评论 #43286554 未加载
评论 #43286454 未加载
评论 #43287692 未加载
kelseyfrog2 个月前
I&#x27;m personally happy to see effort in this space simply because I think it&#x27;s an interesting set of tradeoffs (compute ∝ accuracy) - a departure from the fixed next token compute budget required now.<p>It brings up interesting questions, like what&#x27;s the equivalency between smaller diffusion models which consume more compute because they have a greater number of diffusion steps compared to larger traditional LLMs which essentially have a single step. How effective is decoupling the context window size to the diffusion window size? Is there an optimum ratio?
评论 #43287304 未加载
prometheus762 个月前
Why did the person who posted this change the headline of the article (&quot;Diffusion models are interesting&quot;) into a nonsensical question?
评论 #43295124 未加载
评论 #43292037 未加载
antirez2 个月前
There is a disproportionate skepticism in autoregressive models and a disproportionate optimism in alternative paradigms because of the absolutely non verifiable idea that LLMs, when predicting the next token, don&#x27;t already model, in the activation states, the gist of what they could going to say, similar to what humans do. That&#x27;s funny because many times it can be observed in the output of truly high quality replies that the first tokens only made sense <i>in the perspective</i> of what comes later.
评论 #43289074 未加载
kazinator2 个月前
Interestingly, that animation at the end <i>mainly</i> proceeds from left to right, with just some occasional exceptions.<p>So I followed the link, and gave the model this bit of conversation starter:<p>&gt; <i>You still go mostly left to right.</i><p>The denoising animation it generated went like this:<p>&gt; [Yes] [.] [MASK] [MASK] [MASK] ... [MASK]<p>and proceeded by deletion of the mask elements on the right one by one, leaving just the &quot;Yes.&quot;.<p>:)
gdiamos2 个月前
I think these models would get interesting at extreme scale. Generate a novel in 40 iterations on a rack of GPUs.<p>At some point in the future, you will be able to autogen a 10M line codebase in a few seconds on a giant GPU cluster.
评论 #43286479 未加载
评论 #43287346 未加载
jacobn2 个月前
The animation on the page looks an awful lot like autoregressive inference in that virtually all of the tokens are predicted in order? But I guess it doesn&#x27;t have to do that in the general case?
评论 #43286405 未加载
评论 #43286407 未加载
DeathArrow2 个月前
That got me thinking that it would be nice to have something like ComfyUi to work with diffusion based LLMs. Apply LORAs, use multiple inputs, have multiple outputs.<p>Something akin to ComfyUi but for LLMs would open up a world of possibilities.
评论 #43287848 未加载
评论 #43287757 未加载
评论 #43287725 未加载
mistrial92 个月前
this is the huggingface page <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;papers&#x2F;2502.09992" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;papers&#x2F;2502.09992</a>
chw9e2 个月前
This was a very cool paper about using diffusion language models and beam search: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;html&#x2F;2405.20519v1" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;html&#x2F;2405.20519v1</a><p>Just looking at all of the amazing tools and workflows that people have made with ComfyUI and stuff makes me wonder what we could do with diffusion LMs. It seems diffusion models are much more easily hackable than LLMs.
inverted_flag2 个月前
How do diffusion LLMs decide how long the output should be? Normal LLMs generate a stop token and then halt. Do diffusion LLMs just output a fixed block of tokens and truncate the output that comes after a stop token?
alexmolas2 个月前
I guess the biggest limitation of this approach is that the max output length is fixed before generation starts. Unlike autoregressive LLM, which can keep generating forever.
评论 #43288899 未加载
flippyhead2 个月前
It&#x27;s a pet peeve of mine to make a statement in the form of a question?
评论 #43291061 未加载
bilsbie2 个月前
What if we combine the best of both worlds? What might that look like?
beeforpork2 个月前
What it is interesting that the original title is not a question?
评论 #43290324 未加载
FailMore2 个月前
Thanks for the post, I’m interested in them too
monroewalker2 个月前
See also this recent post about Mercury-Coder from Inception Labs. There&#x27;s a &quot;diffusion effect&quot; toggle for their chat interface but I have no idea if that&#x27;s an accurate representation of the model&#x27;s diffusion process or just some randomly generated characters showing what the diffusion process looks like<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43187518">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43187518</a><p><a href="https:&#x2F;&#x2F;www.inceptionlabs.ai&#x2F;news" rel="nofollow">https:&#x2F;&#x2F;www.inceptionlabs.ai&#x2F;news</a>
Philpax2 个月前
I know the r-word is coming back in vogue, but it was still unpleasant to see it in the middle of an otherwise technical blog post. Ah well.<p>Diffusion LMs are interesting and I&#x27;m looking forward to seeing how they develop, but from playing around with that model, it&#x27;s GPT-2 level. I suspect it will need to be significantly scaled up before we can meaningfully compare it to the autoregressive paradigm.
评论 #43287094 未加载
评论 #43286841 未加载
评论 #43286981 未加载
评论 #43287317 未加载
评论 #43287390 未加载
billab9952 个月前
Stopped reading at the r word. Do better.