Title is such clickbait, it does not rewrite the rules of 3d vision, it is a marginal improvement on existing models, and does not work for video, only images. However, Apple open sourced the model weights, which is amazing for research.
This article has a link to the live demo.<p><a href="https://huggingface.co/spaces/akhaliq/depth-pro" rel="nofollow">https://huggingface.co/spaces/akhaliq/depth-pro</a><p>For some pictures it outputs something reasonable, for others it's completely broken (black with colored noise in one area).
Just tried it on a "difficult" image (relatively low contrast photo of a small thin plant in front of a tree trunk with a distant fence in one corner) and it did a pretty good job, I think - <a href="https://imgur.com/a/Sqr6hR8" rel="nofollow">https://imgur.com/a/Sqr6hR8</a> including the depth maps.
Was this trained on iPhone photos since there is a decent amount of depth references within iPhone cameras? It’s interesting to see how clearly it understands depth of field. With that, how does it perform on F16 and above?