This reminds me of something a few friends and I tried a couple of months ago: No matter what prompt was used, neither Midjourney nor Dream Studio could generate an image of a man wearing a red suit jacket with a blue shirt. (We were trying for red suit + blue shirt + white tie... but even just the first two proved impossible.) Presumably the combination is so unusual as to run counter to the training data of the models. Likewise for a forehead with three eyes.
On a similar note to Stable Diffusion refusing to put 3 eyes in the middle of a sci-fi character's forehead: I have been experimenting with GPT-3 rewriting some of my sci-fi stuff. It's really funny because it right away tries to steer the plot into the most cliche sci-fi storyline and characterization possible where all the characters are perfect almost superhero like action heroes capable of incredible feats of strength and agility. My characters have a lot of flaws, and aren't impressive in an action movie sort of way so GPT-3 winds up being almost totally unusable.
This is achievable without copy/pasting eyes: If you're using the Automatic111 GUI, go to img2img -> inpaint, mask the area for one eye (on the forehead), enter prompt, and set padding = 0 and denoising accordingly (0.4 - 0.6 would be acceptable). Repeat for all three eyes. You can add practically anything to an image with inpainting, provided your prompt and padding is correct.
This is what Invoke's Stable Diffusion canvas solves for.<p><a href="https://youtu.be/RwVGDGc6-3o" rel="nofollow">https://youtu.be/RwVGDGc6-3o</a>
Dall-E has an interesting take on the problem: <a href="https://labs.openai.com/sc/JZIuAmvnELh8cMnBsLRVo5qk" rel="nofollow">https://labs.openai.com/sc/JZIuAmvnELh8cMnBsLRVo5qk</a>
One of the linked resources in the article is a great high-level overview of how Stable Diffusion works:<p><a href="https://stable-diffusion-art.com/how-stable-diffusion-work/" rel="nofollow">https://stable-diffusion-art.com/how-stable-diffusion-work/</a><p>It’s a quick read and I found it very helpful.
Recent and related:<p><i>Remaking old computer graphics with AI image generation</i> - <a href="https://news.ycombinator.com/item?id=34212564" rel="nofollow">https://news.ycombinator.com/item?id=34212564</a> - Jan 2023 (73 comments)
I only have a 3060 laptop gpu and an im2im run like this barely takes 3 seconds. It's really fun and near real time if you keep the unet loaded in vram in between runs instead of re-loading it like calling a script would likely do.
I’ve tried to get stable diffusion to draw 3-armed pianists, or pianists with extra fingers, and failed, probably for the same reasons this was difficult
"each inpainting took about 20 seconds which was quite annoying. But I could envision a future where generation is basically real-time, imagine navigating through possible generations using mouse wheel and tweaking the parameters and seeing the effects in real-time"<p>This is really funny actually, considering what basic Photoshop tools are capable of out of the box :)