科技回声

Very cool stuff! I got it running on Win10 with minimal effort. Really impressive results, I've been working on some plotter art and I use normal data to guide stroke orientation, in the past I've mostly worked with 3D scenes where the normal data is free, but I'm excited to try working with photos using this tool.

Can someone more knowledgeable than me help me understand a few points about this article ?<p>It claims to be diffusion-based, but the main 2 differences from an approach like Stable-Diffusion is that (1) they only consider a single step, instead of a traditional 1000 and (2) they directly predict the value z^y instead of a noise direction. According to their analyses, both of these differences help in the studied tasks. However, isn't that how supervised learning has always worked ? Aside from having a larger model, this isn't very different from "traditional" depth estimation that don't claim anything to do with diffusion.<p>It also claims to have zero-shot abilities, but they fine-tune the denoising model f_theta on a concatenation of the latent image and apply a loss using the latent label. So their evaluation dataset may be out-of-distribution, but I don't understand how that's zero-shot. Asking ChatGPT to output a depth estimation of a given image would be zero-shot because it hasn't been trained to do that (to my knowledge).

Lotus: Diffusion-Based Visual Foundation Model for High-Quality Dense Prediction

2 条评论

Lotus: Diffusion-Based Visual Foundation Model for High-Quality Dense Prediction

2 条评论