Interesting to see if/how this is different (or maybe built upon) Segment Anything[0].<p>For folks in the know: I often see segmentation models on video frames producing patchy results (see the DinoV2 video of the running dog, the body gets black patches randomly, so the segmentation fails for certain frames). What methods are folks using to deal with this - standard fine-tuning, or is there a way to "force" the area to be cleanly segmented (ie, add a bounding box around the class to supplement the data)?<p>And is it something that can be implemented in foundation models, or are we always going to have patchy zero-shot results like this on video files?<p>[0]<a href="https://github.com/facebookresearch/segment-anything">https://github.com/facebookresearch/segment-anything</a>