Authors here!<p>Stable Diffusion and DALL·E have made it simpler to create new images. But getting photorealistic results is still a challenge…<p>This is a continuation of our work from 3 months ago on “This Food Does Not Exist” (<a href="https://news.ycombinator.com/item?id=32167704" rel="nofollow">https://news.ycombinator.com/item?id=32167704</a>). We are now using both Stable Diffusion and GANs depending on the subjects we want to render.
I see an uncharacteristically high number of malformed images. For example,<p>Six legged gecko: <a href="https://nyx.gallery/cdn-cgi/imagedelivery/16bz3hOZjq1MqWAKnmqHMQ/v0_9f0475078b391da2a381ee97a6b9e400/public" rel="nofollow">https://nyx.gallery/cdn-cgi/imagedelivery/16bz3hOZjq1MqWAKnm...</a><p>Dog with warped face: <a href="https://nyx.gallery/cdn-cgi/imagedelivery/16bz3hOZjq1MqWAKnmqHMQ/v0_f3feac216f6dfe61395c674a8c3010dd/public" rel="nofollow">https://nyx.gallery/cdn-cgi/imagedelivery/16bz3hOZjq1MqWAKnm...</a><p>Bizarrely proportioned lion: <a href="https://nyx.gallery/cdn-cgi/imagedelivery/16bz3hOZjq1MqWAKnmqHMQ/v0_b8a9c03e8492a8744f6a98b7a8524946/public" rel="nofollow">https://nyx.gallery/cdn-cgi/imagedelivery/16bz3hOZjq1MqWAKnm...</a><p>Rabbit with fur and whisker artifacts, misplaced hind leg, and weird front paw: <a href="https://nyx.gallery/cdn-cgi/imagedelivery/16bz3hOZjq1MqWAKnmqHMQ/v0_b1cadd303c6547f472878c5b6199efd9/public" rel="nofollow">https://nyx.gallery/cdn-cgi/imagedelivery/16bz3hOZjq1MqWAKnm...</a>
I find Stable Diffusion is pretty good at generating single-subject images like this. It's really mind blowing and the novelty hasn't worn off for me yet.<p>But after dozens of attempts I still haven't managed to get it to show me a photograph of a duck eating a hoagie at Niagara Falls. I think it would be really interesting to try to find the simplest query that these tools cannot produce.
>> From FAQs: “we are using both diffusion models and GANs in combination with an extensive filtering and quality assessment pipeline that allows us to generate photorealistic images at scale.”<p>For every image that reaches the site, how many were generated that were filtered out by the pipeline? For example, for every photo that reaches site, 1000 were generated but did not pass the quality assessment pipeline.
As a meatspace photographer, I take some comfort that the photograph in column 3, row 25 has an issue. The mountain peak has abundant snow. Its reflection in the lake doesn't. There are similar snow reflection disparities in several of the mountains-reflected-in-water pix.
One thing I think would be super fascinating to have is a system that can take AI image training sets and reverse engineer which pictures were used to make the output. Take the bunny. I bet there were a lot of bunny pictures in the training set that looked very similar to the generated one. It would be interesting to have a system pick the one that is closest and display it next to it. It would be show how original (or unoriginal) these images are.
It's interesting to see what types of features these models don't distinguish well. For example, I've noticed that a lot of models have trouble with giving lady bugs distinct spots. Instead they usually end up with a big black splotch.