This is super-interesting as a facet of the "Intelligence is Compression" model. It's tempting to anthropomorphize and say that the system is building an opinion about what an image, fundamentally, _is_. I'm inclined to believe that these kinds of compressive abstractions are integral to higher-level reasoning. Could you build a system with even better behavior, for example, if you included text snippets describing the images, and a multi-modal model?<p>I'd be interested to see an analysis of the behaviors of this system compared to more generative efforts, like autoencoders or Deep Dream.