Seems useful to think of this from a computer vision standpoint. With vision problems, you often first need human annotation, and unless human beings can agree to a high degree, there are going to be problems. The "number of clouds" in a picture is an ambiguous enough concept that I can't imagine everybody agreeing.<p>That said, if there is clearly _one_ cloud, I think most annotators would agree that there is one cloud (and not, say, infinitely many).<p>So going from that, you can frame it as a constraint optimization problem. You want the largest possible collection of droplets to be a cloud, without accidentally defining all the clouds in the world into a single cloud. There has to be a loss function for the cloud-ness of a set of droplets based off how dispersed the droplets are in it.<p>Think about the fill bucket in Microsoft Paint. A single pixel hole allows the entire image to get painted one color. We don't want our definition of cloud to leak along the single droplets that exist in the air to define the entire atmosphere as a cloud, but we definitely want to group certain things together as clouds.<p>Hopefully that is food for thought for someone who is better versed at the specifics of anything I just said!