TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Diffusion Forcing: Next-Token Prediction Meets Full-Sequence Diffusion

214 点作者 magoghm10 个月前

7 条评论

vessenes10 个月前
A number of ideas seem notable to me here; first, they are merging the idea of sequence masking (the key training idea for LLMs) with diffusion models; they do this by keeping track of an ‘uncertainty’ level per pixel. This ‘uncertainty’ level is treated as the ‘noise’ level for the diffusion model, (a model which denoises controlled by some sort of embedding).<p>There are a bunch of neat things you can do with this: in particular, you can firm up parts of the image earlier than others, and thus use it for, say maze solving. They even show it controlling a robot arm moving fruit around, which is pretty wild.<p>In a way the title undersells the idea - this is a way to do <i>fractional</i> masking, since the masking level is a float - and I think is really a pretty profound and interesting idea.<p>However, there’s a lot not talked about in this paper; I’d be very curious to see their codebase. <i>How</i> exactly do you set up a maze-following task vs a video extension task? How do you hook up a robot arm to this model, and tell the model what you want done? The architecture itself deserves a significant number of papers &#x2F; explication.
评论 #40874214 未加载
评论 #40879092 未加载
luke-stanley10 个月前
Anyone know of research or tools for using an existing text generating LLM with diffusion like techniques with no new pre-training, or at most, a bit of fine-tuning, such that it works with a small GPT &#x2F; Phi 3 &#x2F; Gwen model, for example? I know about Tree of Thoughts with MCTS etc, that are somewhat similar (though often with a different reward learned goal) but I&#x27;m interested in something closer to token level generation. Is this possible?
评论 #40873426 未加载
jimsimmons10 个月前
I work in the field and the work is presented in an extremely obtuse manner.<p>What is the problem you&#x27;re trying to solve? Are you proposing a new generative model?
评论 #40881679 未加载
treprinum10 个月前
Russ is doing diffusion now? Must be very applicable to robotics.
评论 #40877790 未加载
blovescoffee10 个月前
Am I missing something about training time? Does adding per token noise cause training to slow significantly? Cool paper though!
y1zhou10 个月前
Cool work! Curious if this can be applied back in LLMs as a discrete diffusion model with <i>partial masking</i>.
omerhac10 个月前
Very cool, but why is it called diffusion forcing?
评论 #40879514 未加载