I remember about 10 years ago in the context of using neural network for image classification, they had a study of what the AI actually saw that justified its decision. This, by the way, is how we got "Deep Dream".<p>For "dumbbell", for the AI, no dumbbell was complete without a muscular arm holding it. That's because in most images in the training dataset have the dumbbells being held, so the system integrated the arm in the pattern. I guess it is the same idea here, most pictures of bananas show several of them, so for the AI, bananas are things that don't go alone.