TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Large Language Diffusion Models

45 点作者 SerCe3 个月前

4 条评论

jasonjmcghee3 个月前
As always you kind of need to play with the model to see how well it actually works as benchmarks can be misleading (e.g. phi-2)<p>But at face value, a new architectural approach with the same capacity (8b) trained on a dataset 1&#x2F;6th the tokens, being competitive with llama3-8b is exciting
jboggan3 个月前
Not sure why they included a hallucination as one of their first examples:<p>&quot;Please recommend me three famous movies&quot;<p>&quot;The Empire Strikes Back (1980) - Directed by George Lucas&quot;
评论 #43114269 未加载
评论 #43111909 未加载
评论 #43111624 未加载
billconan3 个月前
it doesn&#x27;t seem to support variable length for input and output, does it?<p>The paper seems to use EOS padding to create fixed length input&#x2F;output.<p>so is there a maximum output length?
评论 #43111526 未加载
flowerthoughts3 个月前
Masking looks interesting for sequences that can&#x27;t be lossy. If an image squishes a pixel here or there, it won&#x27;t be noticed, but if a sentence lacks room for &quot;if&quot;, that sounds bad.<p>Does this force the model to encode a high-level answering strategy? (AFAIU, there&#x27;s no reordering during sampling.) Or does it mean a masking model of a certain size is more prone to making things up that fit the blank space?
评论 #43118263 未加载