Nice paper. I particularly like how they talk through the ideas they tried that <i>didn’t</i> work, and the process they used to land on the final results. A lot of ML papers present the finished result as if it appeared from nowhere without trial and error, perhaps with some ablations in the appendix and I wish more papers followed this one in talking about the dead ends along the way.
Very interesting work! More details here: <a href="https://depth-anything.github.io/" rel="nofollow">https://depth-anything.github.io/</a><p>It seems better overall and per parameter than current work, with relative and absolute measurement.<p>Is there any research people are aware of that provides sub-mm level models? For 3D modeling purposes? Or is "classic" photogrammetry still the best option there?
In grad school I was using stereo video cameras to measure fish. I wonder if a model like this could do it accurately from frame grabs from a single feed now. And of course an AI to identify fish, even if was just which sections of video had/did not have fish, not even doing the species level ID, would have saved a ton of time.<p>We had a whole workshop on various monitoring technologies and the take home from the various video tools is that having highly trained grad students and/or techs watch and analyze the video is extremely slow and expensive.<p>I haven't worked with video in a while now, but I wonder if any labs are doing more automated identification these days. It feels like the kind of problem that is probably completely solvable if the right tech gets applied.
Can someone explain the meaning of labelled vs unlabelled in this context? What kind of information would the labels carry?<p>Did they have depth maps for all 62 million images or not?