> it was difficult to find images where the entire llama fit within the frame<p>I had the same trouble. In my experiment I wanted to generate a Porco Rosso style seaplane. illustration. Sadly none of the generated pictured had the whole of the airplane in them. The wingtips or the tail always got left off.<p>I found this method to be a reliable workaround: I have downloaded the image I liked the most. Used an image editing software to extend the image in the direction I wanted it to be extended and filled the new area with a solid colour. Cropped a 1024x1024 size rectangle such that it had about 40% generated image, and 60% solid colour. Uploaded the new image and asked DALL-E to infill the solid area while leaving the previously generated area unchanged. Selected from the generated extensions the one I liked the best, downloaded it and merged it with the rest of the picture. Repeated the process as required.<p>You need a generous amount of overlap so the network can figure out which parts is already there and how best to fit the rest. It's a good idea to look at the image segment you need to be infilled. If you as a human can't figure out what it is you are seeing, then the machine won't be able to figure it out either. It will generate something, but it will look out of context once merged.<p>The other trick I found: I wanted to make my picture a canvas print, and thus I needed a higher resolution image. Higher even then what I can reasonably hope with the above extension trick. What I did is that I have upscaled the image (used bigjpg.com, but there might be better solutions out there.) After that I had a big image, but of course there weren't many small scale details now on it. So I have sliced it up to 1024x1024 rectangles, uploaded the rectangles to DALL-E and asked it to keep the borders intact but redraw the interior of them. This second trick worked particularly well on an area of the picture which shown a city under the airplane. It has added nice small details like windows and doors and roofs with texture without disturbing the overall composition.<p>What I did:
I was curious to compare results with Craiyon.ai<p>Here is "llama in a jersey dunking a basketball like Michael Jordan, shot from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, dramatic backlighting, epic, digital art": <a href="https://imgur.com/a/7LoAtRx" rel="nofollow">https://imgur.com/a/7LoAtRx</a><p>Here is "Llama in a jersey dunking a basketball like Michael Jordan, screenshots from the Miyazaki anime movie", much worst: <a href="https://imgur.com/a/g99G7Bn" rel="nofollow">https://imgur.com/a/g99G7Bn</a>
I'm usually very much a skeptic when it comes to "revolutionary" tech. I think the blockchain is crap. I think fully self-driving cars are still a long way away. I think that VR and the metaverse are going to remain gimmicks in the foreseeable future.<p>But this DALL-E thing, it's really blowing my mind. That and deep fakes, now that's sci-fi tech. It's both exciting and a bit scary.<p>The idea that in the not so far future one will be able to create images (and I presume later, audio and video) of basically anything with just a simple text prompt is rife with potential (both good and bad). It's going to change the way we look at art, it's also going to give incredibly powerful creative tools to the masses.<p>For me the endgame would be an AI sufficiently advanced that one could prompt "make an episode of Seinfeld that centers around deep fakes" and you'd get an episode virtually indistinguishable from a real one. Home-made, tailor-made entertainment. Terrifyingly amazing. See you in a few decades...
If you're interested in browsing creative prompts, I highly recommend the reddit community at r/dalle2.<p>Some are impressive:<p><pre><code> - www.reddit.com/r/dalle2/comments/uzosy1/the_rest_of_mona_lisa
- www.reddit.com/r/dalle2/comments/vstuns/super_mario_getting_his_citizenship_at_ellis
</code></pre>
And others are hilarious:<p><pre><code> - www.reddit.com/r/dalle2/comments/v0pjfr/a_photograph_of_a_street_sign_that_warns_drivers
- www.reddit.com/r/dalle2/comments/wbbkbb/healthy_food_at_mcdonalds
- www.reddit.com/r/dalle2/comments/wlfpax/the_elements_of_fire_water_earth_and_air_digital</code></pre>
“In working with DALL·E 2, it’s important to be specific about what you want without over-stuffing or adding redundant words.”<p>I found this to be the most important point from this piece. Often people don't really know what they really want when it comes to creative work, let alone to some omniscient algorithm. In spite of that, it's a delight to see something you love from an unspecific prompt that you won't find with anything you receive from a human.<p>Dall.E 2 never ceases to amaze me.<p>For anyone interested in learning about what Dall.E 2 can do, the author also links to the Dall.E 2 prompt book (discussed in this post <a href="https://news.ycombinator.com/item?id=32322329" rel="nofollow">https://news.ycombinator.com/item?id=32322329</a>).
> DALL·E 2 struggles to generate realistic faces. According to some sources, this may have been a deliberate attempt to avoid generating deepfakes.<p>That might be true, but after experimenting with DALL·E 2 last week (and spending more than $15), I have a different theory.<p>My tests focused on how well it could create art works around three common themes: still life, landscape, and portrait. For the first two categories, almost all the results were works that would not have looked out of place in a museum or art gallery. In contrast, with the prompt of “A painting of a young woman sitting in a chair” and variations, while DALL·E 2 produced convincing clothing, furniture, background, etc., the faces were mostly horrible. I started adding “from the rear” and “turned to the side” to the prompt just to get the face out of the picture.<p>I came to suspect that DALL·E 2 is bad at faces not because the developers made it that way but because human beings are uniquely hardwired to recognize faces. Most people are able to recognize and remember hundreds of faces, and we are very sensitive to minor changes in their configurations (i.e., facial expressions). When we look at a painting of a person sitting in a chair, we don’t care if aspects of the chair, the person’s clothing, etc. are not precisely accurate; a slight distortion of the face, however, can ruin the entire work. DALL·E 2 does not seem to have been trained to have the same sensitivity to faces that humans have.<p>If anyone is interested, the works that DALL·E 2 created for me are at [1]; video slideshows with musical accompaniment are at [2].<p>[1] <a href="http://www.gally.net/temp/dalleimages/index.html" rel="nofollow">http://www.gally.net/temp/dalleimages/index.html</a><p>[2] <a href="https://www.youtube.com/playlist?list=PLj4urky_8icRPzgFS_b98Ks_dPDkNOCvc" rel="nofollow">https://www.youtube.com/playlist?list=PLj4urky_8icRPzgFS_b98...</a>
I ran into this too. When I got my invite, I told a friend I would learn how to talk to DALL-E by having it make some concept art for the game he was designing. I ran through all of my free credits, and most of the first $15 bucket and never really got anything usable.<p>Even when I re-used the <i>exact prompts</i> from the DALL-E Prompt Book, I didn't get anything near the level of quality and fidelity to the prompt that their examples did.<p>I know it's not a scam, because it's clearly doing amazing stuff under the hood, but I went away thinking that it wasn't as miraculous as it was claimed to be.
The images remind me of one of my dreams where logic and reasoning are thrown out and the pure gist of the thing is taken. I wonder if it is because it is built with vector operations and calculus to determine the closest match or fuzzy matches for essentially everything it eventually determines sans cognition, things would tend to be more fuzzy or quasi-close but not quite there. Very entertaining post.<p>I have my own api key as well but not with DALL-E 2 access just yet but seems similar in terms of prompting text in stages to get what you want. It feels kind of like negotiating with it in some way.
>the ball is positioned in such a way that the llama has no real hope of making the shot<p>I love that we're at the level where the physical "realism" of correctly representing quadrupedals playing basketball is a thing now. I suppose the next level AI will be expected to model a full 3d environment with physical assumptions based on the prompt and then run the simulation
My current move is creating initial versions of images with Midjourney, which seems to be a bit more "free-spirited" (read: less _literal_, more flexible) and then using DALL-E's replace tool to fill in the weird looking bits. It works pretty well, but it's a multi-step process and requires you have pay for Midjourney and DALL-E.
Same prompts generated by Midjourney for comparison. I'd say a lot worse, but Midjourney is good at other things like sci-fi art.<p>Film still of a llama in a jersey dunking a basketball like Michael Jordan, low angle, show from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, indoors, dramatic backlighting.<p><a href="https://cdn.discordapp.com/attachments/999377404113981462/1007352942891900978/ray_Film_still_of_a_llama_in_a_jersey_dunking_a_basketball_like_4dfd0ce8-d767-4756-a876-1354a6a29af1.png" rel="nofollow">https://cdn.discordapp.com/attachments/999377404113981462/10...</a><p>Llama in a jersey dunking a basketball like Michael Jordan, screenshots from the Miyazaki anime movie<p><a href="https://cdn.discordapp.com/attachments/999377404113981462/1007353042334646375/ray_Llama_in_a_jersey_dunking_a_basketball_like_Michael_Jordan__512e09e7-5df1-4f09-9e94-746ec43b57e2.png" rel="nofollow">https://cdn.discordapp.com/attachments/999377404113981462/10...</a>
-- spent a day with DALL-E - here are some of my favorites: <a href="https://imgur.com/a/uD5yjV3" rel="nofollow">https://imgur.com/a/uD5yjV3</a> --
I picture in a few years we will be playing around with a code generation tool, and people will be drawing similar conclusions. "You have to be really specific about what you like. If you just say 'chat tool', it will allow you to chat to one other person only."
<a href="https://pitch.com/v/DALL-E-prompt-book-v1-tmd33y" rel="nofollow">https://pitch.com/v/DALL-E-prompt-book-v1-tmd33y</a><p>The DALL-E 2 prompt book. If anything, pretty neat look at how the various prompts come out and some of the art created by it.
This is really good fun, actually. Spent some time fucking around with it and it can make some impressive photorealistic stuff like "hoverbus in san francisco by the ferry building, digital photo".<p>I mostly use it and Midjourney for material for my DnD campaign, but I'm going to need to do a little more work to make the whole thing coherent. Only tried it once and it was okay.<p>The interesting part is that it can do things like "female ice giant" reasonably whereas google will just give you sexy bikini ice giant for stuff like that which is not the vibe of my campaign!
My two cents: the techniques OP uses are absolutely valid, but I've found much more success "sampling" styles and poses from existing works.<p>Rather than trying to perfectly describe my image, I like to use references where the source material has what you want. With minimal direction these prompts get impressively close:<p>"larry bird as a llama, dramatic basketball dunk in a bright arena, low angle action shot, from the movie Madagascar (2005)" <a href="https://labs.openai.com/s/wxbIbXa0HRwwGUqQaKSLtzmR" rel="nofollow">https://labs.openai.com/s/wxbIbXa0HRwwGUqQaKSLtzmR</a><p>"Michael Jordan as a llama dunking a basketball, Space Jam (1996)" <a href="https://labs.openai.com/s/mX4T5Iak8CMO1rPAmjRb7oyH" rel="nofollow">https://labs.openai.com/s/mX4T5Iak8CMO1rPAmjRb7oyH</a><p>At this point I'd experiment with more stylized/recognizable references or add a couple "effects" to polish up the results.
It's fun to play around with it, but like the author found, what you get is often strange or useless. I also find 1k images too small to do much with but I realize making 4k images would be cost prohibitive. I also wish it could generate vector images as well as pixel images. That would be fun to use.
Wow the blogs posted here are awesome, the octopus and this lama are awesome.<p>Myself cant seem to get it to work. I think it's not very good at real things. Tried fitness related images, all is weird. Probably with fantasy kinda stuff its better since it has to be less accurate.
I recently made PromptWiki[0] to try to document useful prompts and examples.<p>I think we're at the beginning of exploring what these image models can do and what the best ways to work with them are.<p>[0] <a href="https://promptwiki.com" rel="nofollow">https://promptwiki.com</a>
> Tip: DALL·E 2 only stores the previous 50 generations in your history tab. Make sure to save your favourite images as you go.<p>This is kind of funny. DALL·E is one of the most impressive pieces of software, but such a basic feature like history is curiously underpowered.
> It’s important to tell DALL·E 2 exactly what you want<p>That’s not as easy as it sounds. Specially in the surreal cases that DALL-E is usually requested.<p>Sometimes you don’t know what you want until you see it. Other times you do, but are not able to express in ways that the computer can understand.<p>I see being able to communicate efficiently with the machine as a future in demand skill
I tried a number of these generators a week ago (or so), all with the same prompt: "A child looking longingly at a lollipop on the top shelf" with pretty abysmal (and sometimes horrifying) results. I'm not sure if my expectations are too high, but maybe I was doing it wrong?
There was a thread on r/DigitalArt about people debating if you're really an artist if you're using these AI creator websites.<p>Some guy spent hours feeding the AI pictures he liked to get an end result he was happy with.
A lot of these posts showing up on HN. I wonder - is it because it is so new, or is it because the ways in which we are to use this technology are so nascent that we are discovering how to use it more precisely daily?
If you think it’s hard to get an AI to render what’s in your mind, try another human artist. Specifying something visually complex with an assumption that it’ll be precisely what you’re imagining is shockingly hard. I’m not surprised prompt creation is so complex. At least with the AI bots the turn around time for iteration is tight. That said humans likely iterate fewer times, but each iteration takes a long time.
Purely economic take: I’m sure that as knowledge builds over time, people will get more efficient at prompt generation, but the $15 in credits ignores the cost of the time spent to build the final prompt. I wonder how this compares to a junior graphic designer in terms of TCO.
Love the stylistic ones. Amazing how it generates such good anime and vaporwave variants, like the neon vaporwave backboard.<p>I ran out of credits way too fast, so I like to see other people playing with it and their iterative process.
You can also play around for free on a slightly less sophisticated model here <a href="https://art.elbo.ai" rel="nofollow">https://art.elbo.ai</a>
Is it hard to reimplement that algorithm? I want to see what people would do with porn-enabled image generator. Hopefully pornhub already hiring data scientists.
DALL-E is truly magic. It got me believing we are close to AGI.<p>I wonder what Gary Marcus or Filip Pieknewski think about it. Surely they must be eating crow.