TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Lotus: Diffusion-Based Visual Foundation Model for High-Quality Dense Prediction

47 点作者 jasondavies7 个月前

2 条评论

thot_experiment7 个月前
Very cool stuff! I got it running on Win10 with minimal effort. Really impressive results, I've been working on some plotter art and I use normal data to guide stroke orientation, in the past I've mostly worked with 3D scenes where the normal data is free, but I'm excited to try working with photos using this tool.
curvilinear_m7 个月前
Can someone more knowledgeable than me help me understand a few points about this article ?<p>It claims to be diffusion-based, but the main 2 differences from an approach like Stable-Diffusion is that (1) they only consider a single step, instead of a traditional 1000 and (2) they directly predict the value z^y instead of a noise direction. According to their analyses, both of these differences help in the studied tasks. However, isn&#x27;t that how supervised learning has always worked ? Aside from having a larger model, this isn&#x27;t very different from &quot;traditional&quot; depth estimation that don&#x27;t claim anything to do with diffusion.<p>It also claims to have zero-shot abilities, but they fine-tune the denoising model f_theta on a concatenation of the latent image and apply a loss using the latent label. So their evaluation dataset may be out-of-distribution, but I don&#x27;t understand how that&#x27;s zero-shot. Asking ChatGPT to output a depth estimation of a given image would be zero-shot because it hasn&#x27;t been trained to do that (to my knowledge).