It's interesting to me, the model does not know what a house is like, it knows what <i>pictures of houses</i> are like. So it does a good job of making pictures that look like pictures of houses. But if you look closely, a lot of the details are really weird, unbuildable, or just non-sensical.<p>All of the image-gen models have this problem - look at the hands and faces in the generated images of people and there are often bizarre deformations.<p>It's fascinating because it's the opposite of how children learn to draw. They tend to think about the pieces that make a thing and then try to put all the pieces on paper, and they end up making a drawing that (for instance) looks nothing like a person but has two eyes, a nose, a mouth, etc. in roughly the right relation to each other. (They rarely draw ears though!) The child is thinking about "what makes a face a face" and then trying to represent it.
The ML model is sort of distributing pixels in a probabilistic way that comes up with something very similar to the pixels in a sample image in its training set, superficially much better than a kids drawing and yet in some ways much worse upon close inspection.