OP: Forgive me if this is out of place. Also, please know that my question is genuine, not at all a reflection on the author/their project, and most certainly born out of my own ignorance:<p>Why are these kinds of things impressive?<p>I think part of my issue is that I don't really "get" these ML projects ("This X does not exist" or perhaps ML in general).<p>My understanding is that, in layman's terms, workers are shown many, many examples of X and then are asked to "draw"/create X, which they then do. The corollary I can think of is if I were to draw over and over for a billion, billion years and each time a drawing "failed" to capture the essence of a prompt (as deemed by some outside entity), both my drawing, and my memory of it was erased. At the end of that time, my skill in drawing X would be amazing.<p>_If_ that understanding is correct, it would seem unimpressive? It's not as though I can pass a prompt of "cookie" to an untrained generator and, it pops out a drawing of one. And likewise, any cookie "drawing" generated by a trained model is simply an amalgam of every example cookie.<p>What am I missing?
Looks impressive but I can't escape the notion that surely some of the generated images will be very close to the some of the training images?<p>How am I to assess how original the generated results really are?
At least with DALL-E you can be sure the food has a name. For a moment I was worried this would produce vaguely food-like images where on closer look you realise you have no idea what you're looking at - like a lot of other "this X does not exist" projects seem to do.<p>Also a bit of cultural bias in the training is shown I think. The "pile of cookies" prompt seems to mostly generate American cookies, while e.g. a German user might be disappointed they didn't get this: <a href="https://groceryeshop.us/image/cache/data/new_image_2019/ABSB0005XOJBS_0-600x600.jpg" rel="nofollow">https://groceryeshop.us/image/cache/data/new_image_2019/ABSB...</a> :)
We have trained four StyleGAN2 image generation models and are releasing checkpoints and training code. We are exploring how to improve/scale up StyleGAN training, particularly when leveraging TPUs.<p>While everyone is excited about DALL·E/diffusion models, training those is currently out of reach for most practitioners. Craiyon (formerly DALL·E mega) has been training for months on a huge TPU 256 machine. In comparison our models were each trained in less than 10h on a machine 32x smaller. StyleGAN models also still offer unrivaled photorealism when trained on narrow domains (eg thispersondoesnotexist.com), even though diffusion models are catching up due to massive cash investments in that direction.
Darn! I was hoping for other-worldly foods that don't actually exist being generated from real food attributes. I suppose I should have known better.
I like the thought that, years from now, we're all drinking eating weirdly-presented food / drinking weird cocktails because AI synthesized the images of drinks around the web and decided `cocktails always include fruit` and `all food must be piled high on plate`
Are there any analysis techniques that can easily distinguish between these and real photographs? Do simple things like edge detections or histograms reveal any anomalies?
The food looks great! I suppose these models could use some extra training with dishes, though. The plates and glasses look wobbly, which is an instant giveaway. Otherwise, I can see this being used by food posters! Maybe not as a primary source, but as a "filler" — for sure.
You can try out the model with this interactive Gradio demo: <a href="https://huggingface.co/spaces/nyx-ai/stylegan2-flax-tpu" rel="nofollow">https://huggingface.co/spaces/nyx-ai/stylegan2-flax-tpu</a>
I tried to use the linked Colab notebook to generate my own, and it appears to have been successful, but I don't see any way to view the generated images via the notebook interface. I'm not familiar with the notebook tool - have I missed something?
I'm honestly surprised that they trained a StyleGAN. Recently, the Imagen architecture has been show to be both easier in structure, easier to train, and even faster to produce good results. Combined with the "Elucidating" paper by NVIDIA's Tero Karras you can train a 256px Imagen* to tolerable quality within an hour on a RTX 3090.<p>Here's a PyTorch implementation by the LAION people:<p><a href="https://github.com/lucidrains/imagen-pytorch" rel="nofollow">https://github.com/lucidrains/imagen-pytorch</a><p>And here's 2 images I sampled after training it for some hours, like 2 hours base model + 4 hours upscaler:<p><a href="https://imgur.com/a/46EZsJo" rel="nofollow">https://imgur.com/a/46EZsJo</a><p>* = Only the unconditional Imagen variant, meaning what they show off here. The variant with a T5 text embedding takes longer to train.
My partner is very impressionable when she see's food in a TV show. Immediately has a craving for it. This thing is like, limitless porn for her gluttony.