Before everyones jump on the "GAN don't generate ground truth" bandwagon, that's not a concern here.<p>From what I understand, this is about creating a dataset of [aerial pictures, building mask] pairs, to then train a segmentation model for urban planning and design.<p>They use Unity and a virtual environment (a la flight simulator I guess) to generate a bunch of samples. They then take any image that has thin cloud covers, so clouds thin enough that you can still see the true outline of the buildings below it, and use the GAN to remove the clouds. Images with thick clouds just get discarded. So in this case the information that wasn't present in the original and that gets generated by the GAN does not matter for the task at hand (such as actual color and details of the building) as only the outline matters, and it was already visible before. That just helps the segmentation model learn by normalizing images in the training dataset.<p>What is not clear to me, is why didn't they just remove all clouds directly in the game engine since they are running in Unity instead of relying on a GAN.