Why I find diffusion models interesting?

202 点作者 whoami_nr2 个月前

20 条评论

The most interesting thing about diffusion LMs that tends to be missed, are their ability to edit early tokens.We know that the early tokens in an autoregressive sequence disproportionately bias the outcome. I would go as far as to say this is some of the magic of reasoning models is they generate so much text they can kinda get around this.However, diffusion seems like a much better way to solve this problem.

评论 #43290437 未加载

评论 #43288400 未加载

vinkelhake2 个月前

I don't get where the author is coming from with the idea that a diffusion based LLM would hallucinate less.> dLLMs can generate certain important portions first, validate it, and then continue the rest of the generation.If you pause the animation in the linked tweet (not the one on the page), you can see that the intermediate versions are full of, well, baloney.(and anyone who has messed around with diffusion based image generation knows the models are perfectly happy to hallucinate).

评论 #43286458 未加载

评论 #43287562 未加载

评论 #43286554 未加载

评论 #43286454 未加载

评论 #43287692 未加载

kelseyfrog2 个月前

I'm personally happy to see effort in this space simply because I think it's an interesting set of tradeoffs (compute ∝ accuracy) - a departure from the fixed next token compute budget required now.It brings up interesting questions, like what's the equivalency between smaller diffusion models which consume more compute because they have a greater number of diffusion steps compared to larger traditional LLMs which essentially have a single step. How effective is decoupling the context window size to the diffusion window size? Is there an optimum ratio?

评论 #43287304 未加载

prometheus762 个月前

Why did the person who posted this change the headline of the article ("Diffusion models are interesting") into a nonsensical question?

评论 #43295124 未加载

评论 #43292037 未加载

antirez2 个月前

There is a disproportionate skepticism in autoregressive models and a disproportionate optimism in alternative paradigms because of the absolutely non verifiable idea that LLMs, when predicting the next token, don't already model, in the activation states, the gist of what they could going to say, similar to what humans do. That's funny because many times it can be observed in the output of truly high quality replies that the first tokens only made sense in the perspective of what comes later.

评论 #43289074 未加载

kazinator2 个月前

Interestingly, that animation at the end mainly proceeds from left to right, with just some occasional exceptions.So I followed the link, and gave the model this bit of conversation starter:> You still go mostly left to right.The denoising animation it generated went like this:> [Yes] [.] [MASK] [MASK] [MASK] ... [MASK]and proceeded by deletion of the mask elements on the right one by one, leaving just the "Yes.".:)

gdiamos2 个月前

I think these models would get interesting at extreme scale. Generate a novel in 40 iterations on a rack of GPUs.At some point in the future, you will be able to autogen a 10M line codebase in a few seconds on a giant GPU cluster.

评论 #43286479 未加载

评论 #43287346 未加载

jacobn2 个月前

The animation on the page looks an awful lot like autoregressive inference in that virtually all of the tokens are predicted in order? But I guess it doesn't have to do that in the general case?

评论 #43286405 未加载

评论 #43286407 未加载

DeathArrow2 个月前

That got me thinking that it would be nice to have something like ComfyUi to work with diffusion based LLMs. Apply LORAs, use multiple inputs, have multiple outputs.Something akin to ComfyUi but for LLMs would open up a world of possibilities.

评论 #43287848 未加载

评论 #43287757 未加载

评论 #43287725 未加载

mistrial92 个月前

this is the huggingface page <a href="https://huggingface.co/papers/2502.09992" rel="nofollow">https://huggingface.co/papers/2502.09992</a>

chw9e2 个月前

This was a very cool paper about using diffusion language models and beam search: <a href="https://arxiv.org/html/2405.20519v1" rel="nofollow">https://arxiv.org/html/2405.20519v1</a>Just looking at all of the amazing tools and workflows that people have made with ComfyUI and stuff makes me wonder what we could do with diffusion LMs. It seems diffusion models are much more easily hackable than LLMs.

inverted_flag2 个月前

How do diffusion LLMs decide how long the output should be? Normal LLMs generate a stop token and then halt. Do diffusion LLMs just output a fixed block of tokens and truncate the output that comes after a stop token?

alexmolas2 个月前

I guess the biggest limitation of this approach is that the max output length is fixed before generation starts. Unlike autoregressive LLM, which can keep generating forever.

评论 #43288899 未加载

flippyhead2 个月前

It's a pet peeve of mine to make a statement in the form of a question?

评论 #43291061 未加载

bilsbie2 个月前

What if we combine the best of both worlds? What might that look like?

beeforpork2 个月前

What it is interesting that the original title is not a question?

评论 #43290324 未加载

FailMore2 个月前

Thanks for the post, I’m interested in them too

monroewalker2 个月前

See also this recent post about Mercury-Coder from Inception Labs. There's a "diffusion effect" toggle for their chat interface but I have no idea if that's an accurate representation of the model's diffusion process or just some randomly generated characters showing what the diffusion process looks like<a href="https://news.ycombinator.com/item?id=43187518">https://news.ycombinator.com/item?id=43187518</a><a href="https://www.inceptionlabs.ai/news" rel="nofollow">https://www.inceptionlabs.ai/news</a>

Philpax2 个月前

I know the r-word is coming back in vogue, but it was still unpleasant to see it in the middle of an otherwise technical blog post. Ah well.Diffusion LMs are interesting and I'm looking forward to seeing how they develop, but from playing around with that model, it's GPT-2 level. I suspect it will need to be significantly scaled up before we can meaningfully compare it to the autoregressive paradigm.

评论 #43287094 未加载

评论 #43286841 未加载

评论 #43286981 未加载

评论 #43287317 未加载

评论 #43287390 未加载

billab9952 个月前

Stopped reading at the r word. Do better.