This is a paper submission that was submitted to the ICLR 2019 conference. To preserve the integrity of the double-blind submission process please do not "dox" the identities of the authors. I am not an author of this paper.<p>Previous works in this area (and there are many) have tackled the exact problem of natural image generation. The specifics of the methods presented in the paper are of considerable interest to the ML/AI research community, but what I'd like to highlight here the <i>results</i>.<p>One might suppose that a neural network could never make images consistent with reality because there are so many hidden latent variables & strong causal factors that lead to the generation of coherent real-world imgaes. For instance, laws of light transport and camera perspective, smooth texture variation on objects, bilateral symmetry in organisms, what objects are "possible" and which objects are not.<p>Often to generate images that are at least plausible under our reality we need to make use of explicit physically-based rendering techniques like simulating the transport of light and accurately modeling a lot of high-fidelity geometry. We have to hard-code so many physical equations and rules into these systems, and even then we still have a hard time rendering everyday things (candlelight, soap bubbles, food).<p>Prior to this paper, I certainly had my doubts that neural networks could ever capture enough implicit "knowledge" about the world to synthesize an image (not in the training set, mind you) that could convince a human it was real. Machine Learning is the study of generalization and we know of no useful guarantees on generalization for finite-size training sets and overparameterized models.<p>To my knowledge, this is the first paper to generate high-fidelity <i>natural</i> images with no apparent visual artifacts (blurriness, weird textures that could not exist in the real world). The laws of light transport (NP-hard to compute) appear to be convincingly preserved, and I am blown away.
An interesting aspect is the sampling from a different distribution on testing versus training.<p>The "truncation trick" samples from a truncated Normal distribution rather than N(0,sigma) for the latent variables. (The values above a particular threshold are just sampled again until they are below that threshold.) I don't completely get this. What's going on there? Is there a mode in the network that defines the prototypical dog and are other dogs defined by nonzero values in the latent variable space? Then this seem to show that the layer exhibits an non-intuitive decomposition of the task at hand. I would expect a zero vector to correspond to an "abstract dog" and have all nonzero parameters contribute in an attribute like fashion. This seems to be more prototype-like, similar to old-fashioned vector quantization.<p>The censored normal max[N(0,sigma),0] is interesting. It reminds me of nonnegative matrix factorization. Check the paper in Nature or just a nice blog post: <a href="https://yliapis.github.io/Non-Negative-Matrix-Factorization/" rel="nofollow">https://yliapis.github.io/Non-Negative-Matrix-Factorization/</a>. By using a nonnegative constraint the representation becomes additive (part-based) and sparse. That's quite different from prototype-like methods.<p>I'm myself experimenting with more complex priors and in my experience it's difficult to find priors that blow everything out of the water. Very nice appendix E. :-)