Spent $15 in DALL·E 2 credits creating this AI image

460 pointsby pat-jayalmost 3 years ago

44 comments

neonatealmost 3 years ago

<a href="https://archive.ph/RwY42" rel="nofollow">https://archive.ph/RwY42</a>

krisoftalmost 3 years ago

> it was difficult to find images where the entire llama fit within the frameI had the same trouble. In my experiment I wanted to generate a Porco Rosso style seaplane. illustration. Sadly none of the generated pictured had the whole of the airplane in them. The wingtips or the tail always got left off.I found this method to be a reliable workaround: I have downloaded the image I liked the most. Used an image editing software to extend the image in the direction I wanted it to be extended and filled the new area with a solid colour. Cropped a 1024x1024 size rectangle such that it had about 40% generated image, and 60% solid colour. Uploaded the new image and asked DALL-E to infill the solid area while leaving the previously generated area unchanged. Selected from the generated extensions the one I liked the best, downloaded it and merged it with the rest of the picture. Repeated the process as required.You need a generous amount of overlap so the network can figure out which parts is already there and how best to fit the rest. It's a good idea to look at the image segment you need to be infilled. If you as a human can't figure out what it is you are seeing, then the machine won't be able to figure it out either. It will generate something, but it will look out of context once merged.The other trick I found: I wanted to make my picture a canvas print, and thus I needed a higher resolution image. Higher even then what I can reasonably hope with the above extension trick. What I did is that I have upscaled the image (used bigjpg.com, but there might be better solutions out there.) After that I had a big image, but of course there weren't many small scale details now on it. So I have sliced it up to 1024x1024 rectangles, uploaded the rectangles to DALL-E and asked it to keep the borders intact but redraw the interior of them. This second trick worked particularly well on an area of the picture which shown a city under the airplane. It has added nice small details like windows and doors and roofs with texture without disturbing the overall composition.What I did:

评论 #32430440 未加载

评论 #32429766 未加载

评论 #32430910 未加载

评论 #32429529 未加载

评论 #32437139 未加载

评论 #32429538 未加载

评论 #32435625 未加载

Karawebnetworkalmost 3 years ago

I was curious to compare results with Craiyon.aiHere is "llama in a jersey dunking a basketball like Michael Jordan, shot from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, dramatic backlighting, epic, digital art": <a href="https://imgur.com/a/7LoAtRx" rel="nofollow">https://imgur.com/a/7LoAtRx</a>Here is "Llama in a jersey dunking a basketball like Michael Jordan, screenshots from the Miyazaki anime movie", much worst: <a href="https://imgur.com/a/g99G7Bn" rel="nofollow">https://imgur.com/a/g99G7Bn</a>

评论 #32431235 未加载

评论 #32432553 未加载

simiasalmost 3 years ago

I'm usually very much a skeptic when it comes to "revolutionary" tech. I think the blockchain is crap. I think fully self-driving cars are still a long way away. I think that VR and the metaverse are going to remain gimmicks in the foreseeable future.But this DALL-E thing, it's really blowing my mind. That and deep fakes, now that's sci-fi tech. It's both exciting and a bit scary.The idea that in the not so far future one will be able to create images (and I presume later, audio and video) of basically anything with just a simple text prompt is rife with potential (both good and bad). It's going to change the way we look at art, it's also going to give incredibly powerful creative tools to the masses.For me the endgame would be an AI sufficiently advanced that one could prompt "make an episode of Seinfeld that centers around deep fakes" and you'd get an episode virtually indistinguishable from a real one. Home-made, tailor-made entertainment. Terrifyingly amazing. See you in a few decades...

评论 #32433725 未加载

_pastelalmost 3 years ago

If you're interested in browsing creative prompts, I highly recommend the reddit community at r/dalle2.Some are impressive:<pre><code> - www.reddit.com/r/dalle2/comments/uzosy1/the_rest_of_mona_lisa - www.reddit.com/r/dalle2/comments/vstuns/super_mario_getting_his_citizenship_at_ellis </code></pre> And others are hilarious:<pre><code> - www.reddit.com/r/dalle2/comments/v0pjfr/a_photograph_of_a_street_sign_that_warns_drivers - www.reddit.com/r/dalle2/comments/wbbkbb/healthy_food_at_mcdonalds - www.reddit.com/r/dalle2/comments/wlfpax/the_elements_of_fire_water_earth_and_air_digital</code></pre>

评论 #32430685 未加载

评论 #32431241 未加载

评论 #32431224 未加载

humbleferretalmost 3 years ago

“In working with DALL·E 2, it’s important to be specific about what you want without over-stuffing or adding redundant words.”I found this to be the most important point from this piece. Often people don't really know what they really want when it comes to creative work, let alone to some omniscient algorithm. In spite of that, it's a delight to see something you love from an unspecific prompt that you won't find with anything you receive from a human.Dall.E 2 never ceases to amaze me.For anyone interested in learning about what Dall.E 2 can do, the author also links to the Dall.E 2 prompt book (discussed in this post <a href="https://news.ycombinator.com/item?id=32322329" rel="nofollow">https://news.ycombinator.com/item?id=32322329</a>).

tkgallyalmost 3 years ago

> DALL·E 2 struggles to generate realistic faces. According to some sources, this may have been a deliberate attempt to avoid generating deepfakes.That might be true, but after experimenting with DALL·E 2 last week (and spending more than $15), I have a different theory.My tests focused on how well it could create art works around three common themes: still life, landscape, and portrait. For the first two categories, almost all the results were works that would not have looked out of place in a museum or art gallery. In contrast, with the prompt of “A painting of a young woman sitting in a chair” and variations, while DALL·E 2 produced convincing clothing, furniture, background, etc., the faces were mostly horrible. I started adding “from the rear” and “turned to the side” to the prompt just to get the face out of the picture.I came to suspect that DALL·E 2 is bad at faces not because the developers made it that way but because human beings are uniquely hardwired to recognize faces. Most people are able to recognize and remember hundreds of faces, and we are very sensitive to minor changes in their configurations (i.e., facial expressions). When we look at a painting of a person sitting in a chair, we don’t care if aspects of the chair, the person’s clothing, etc. are not precisely accurate; a slight distortion of the face, however, can ruin the entire work. DALL·E 2 does not seem to have been trained to have the same sensitivity to faces that humans have.If anyone is interested, the works that DALL·E 2 created for me are at [1]; video slideshows with musical accompaniment are at [2].[1] <a href="http://www.gally.net/temp/dalleimages/index.html" rel="nofollow">http://www.gally.net/temp/dalleimages/index.html</a>[2] <a href="https://www.youtube.com/playlist?list=PLj4urky_8icRPzgFS_b98Ks_dPDkNOCvc" rel="nofollow">https://www.youtube.com/playlist?list=PLj4urky_8icRPzgFS_b98...</a>

评论 #32435644 未加载

评论 #32436664 未加载

karaterobotalmost 3 years ago

I ran into this too. When I got my invite, I told a friend I would learn how to talk to DALL-E by having it make some concept art for the game he was designing. I ran through all of my free credits, and most of the first $15 bucket and never really got anything usable.Even when I re-used the exact prompts from the DALL-E Prompt Book, I didn't get anything near the level of quality and fidelity to the prompt that their examples did.I know it's not a scam, because it's clearly doing amazing stuff under the hood, but I went away thinking that it wasn't as miraculous as it was claimed to be.

评论 #32430339 未加载

评论 #32434012 未加载

sebringjalmost 3 years ago

The images remind me of one of my dreams where logic and reasoning are thrown out and the pure gist of the thing is taken. I wonder if it is because it is built with vector operations and calculus to determine the closest match or fuzzy matches for essentially everything it eventually determines sans cognition, things would tend to be more fuzzy or quasi-close but not quite there. Very entertaining post.I have my own api key as well but not with DALL-E 2 access just yet but seems similar in terms of prompting text in stages to get what you want. It feels kind of like negotiating with it in some way.

评论 #32431514 未加载

评论 #32431603 未加载

falcor84almost 3 years ago

>the ball is positioned in such a way that the llama has no real hope of making the shotI love that we're at the level where the physical "realism" of correctly representing quadrupedals playing basketball is a thing now. I suppose the next level AI will be expected to model a full 3d environment with physical assumptions based on the prompt and then run the simulation

评论 #32430309 未加载

评论 #32432943 未加载

评论 #32435471 未加载

turdnagelalmost 3 years ago

My current move is creating initial versions of images with Midjourney, which seems to be a bit more "free-spirited" (read: less _literal_, more flexible) and then using DALL-E's replace tool to fill in the weird looking bits. It works pretty well, but it's a multi-step process and requires you have pay for Midjourney and DALL-E.

rayshanalmost 3 years ago

Same prompts generated by Midjourney for comparison. I'd say a lot worse, but Midjourney is good at other things like sci-fi art.Film still of a llama in a jersey dunking a basketball like Michael Jordan, low angle, show from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, indoors, dramatic backlighting.<a href="https://cdn.discordapp.com/attachments/999377404113981462/1007352942891900978/ray_Film_still_of_a_llama_in_a_jersey_dunking_a_basketball_like_4dfd0ce8-d767-4756-a876-1354a6a29af1.png" rel="nofollow">https://cdn.discordapp.com/attachments/999377404113981462/10...</a>Llama in a jersey dunking a basketball like Michael Jordan, screenshots from the Miyazaki anime movie<a href="https://cdn.discordapp.com/attachments/999377404113981462/1007353042334646375/ray_Llama_in_a_jersey_dunking_a_basketball_like_Michael_Jordan__512e09e7-5df1-4f09-9e94-746ec43b57e2.png" rel="nofollow">https://cdn.discordapp.com/attachments/999377404113981462/10...</a>

pigtailgirlalmost 3 years ago

-- spent a day with DALL-E - here are some of my favorites: <a href="https://imgur.com/a/uD5yjV3" rel="nofollow">https://imgur.com/a/uD5yjV3</a> --

评论 #32429891 未加载

评论 #32430137 未加载

kristiandupontalmost 3 years ago

I picture in a few years we will be playing around with a code generation tool, and people will be drawing similar conclusions. "You have to be really specific about what you like. If you just say 'chat tool', it will allow you to chat to one other person only."

conceptionalmost 3 years ago

<a href="https://pitch.com/v/DALL-E-prompt-book-v1-tmd33y" rel="nofollow">https://pitch.com/v/DALL-E-prompt-book-v1-tmd33y</a>The DALL-E 2 prompt book. If anything, pretty neat look at how the various prompts come out and some of the art created by it.

anigbrowlalmost 3 years ago

Can't wait for 'Tell HN: how I make mid six figures as a prompt engineer'.

评论 #32431045 未加载

评论 #32432169 未加载

评论 #32433180 未加载

renewiltordalmost 3 years ago

This is really good fun, actually. Spent some time fucking around with it and it can make some impressive photorealistic stuff like "hoverbus in san francisco by the ferry building, digital photo".I mostly use it and Midjourney for material for my DnD campaign, but I'm going to need to do a little more work to make the whole thing coherent. Only tried it once and it was okay.The interesting part is that it can do things like "female ice giant" reasonably whereas google will just give you sexy bikini ice giant for stuff like that which is not the vibe of my campaign!

sgtFloydalmost 3 years ago

My two cents: the techniques OP uses are absolutely valid, but I've found much more success "sampling" styles and poses from existing works.Rather than trying to perfectly describe my image, I like to use references where the source material has what you want. With minimal direction these prompts get impressively close:"larry bird as a llama, dramatic basketball dunk in a bright arena, low angle action shot, from the movie Madagascar (2005)" <a href="https://labs.openai.com/s/wxbIbXa0HRwwGUqQaKSLtzmR" rel="nofollow">https://labs.openai.com/s/wxbIbXa0HRwwGUqQaKSLtzmR</a>"Michael Jordan as a llama dunking a basketball, Space Jam (1996)" <a href="https://labs.openai.com/s/mX4T5Iak8CMO1rPAmjRb7oyH" rel="nofollow">https://labs.openai.com/s/mX4T5Iak8CMO1rPAmjRb7oyH</a>At this point I'd experiment with more stylized/recognizable references or add a couple "effects" to polish up the results.

coldcodealmost 3 years ago

It's fun to play around with it, but like the author found, what you get is often strange or useless. I also find 1k images too small to do much with but I realize making 4k images would be cost prohibitive. I also wish it could generate vector images as well as pixel images. That would be fun to use.

obloidalmost 3 years ago

"Image intentionally modified to blur and hide faces"I thought this was strange. Why hide an AI generated face?

评论 #32433939 未加载

评论 #32430105 未加载

jiggywiggyalmost 3 years ago

Wow the blogs posted here are awesome, the octopus and this lama are awesome.Myself cant seem to get it to work. I think it's not very good at real things. Tried fitness related images, all is weird. Probably with fantasy kinda stuff its better since it has to be less accurate.

f0e4c2f7almost 3 years ago

I recently made PromptWiki[0] to try to document useful prompts and examples.I think we're at the beginning of exploring what these image models can do and what the best ways to work with them are.[0] <a href="https://promptwiki.com" rel="nofollow">https://promptwiki.com</a>

评论 #32436947 未加载

scifibestfialmost 3 years ago

> Tip: DALL·E 2 only stores the previous 50 generations in your history tab. Make sure to save your favourite images as you go.This is kind of funny. DALL·E is one of the most impressive pieces of software, but such a basic feature like history is curiously underpowered.

评论 #32435047 未加载

foobarbecuealmost 3 years ago

It's fascinating to me that in the first image, the llama's jersey has a drawing of a llama on it. I wonder if that was in the prompt?

评论 #32433898 未加载

tambourine_manalmost 3 years ago

> It’s important to tell DALL·E 2 exactly what you wantThat’s not as easy as it sounds. Specially in the surreal cases that DALL-E is usually requested.Sometimes you don’t know what you want until you see it. Other times you do, but are not able to express in ways that the computer can understand.I see being able to communicate efficiently with the machine as a future in demand skill

评论 #32431089 未加载

评论 #32429769 未加载

JadoJodoalmost 3 years ago

I tried a number of these generators a week ago (or so), all with the same prompt: "A child looking longingly at a lollipop on the top shelf" with pretty abysmal (and sometimes horrifying) results. I'm not sure if my expectations are too high, but maybe I was doing it wrong?

评论 #32432784 未加载

pleasantpeasantalmost 3 years ago

There was a thread on r/DigitalArt about people debating if you're really an artist if you're using these AI creator websites.Some guy spent hours feeding the AI pictures he liked to get an end result he was happy with.

jordanmorgan10almost 3 years ago

A lot of these posts showing up on HN. I wonder - is it because it is so new, or is it because the ways in which we are to use this technology are so nascent that we are discovering how to use it more precisely daily?

评论 #32430008 未加载

Vox_Leonealmost 3 years ago

Can I use NLP to generate input for DALL-E 2? That would be cool.

评论 #32430274 未加载

评论 #32429500 未加载

评论 #32429226 未加载

fnordpigletalmost 3 years ago

If you think it’s hard to get an AI to render what’s in your mind, try another human artist. Specifying something visually complex with an assumption that it’ll be precisely what you’re imagining is shockingly hard. I’m not surprised prompt creation is so complex. At least with the AI bots the turn around time for iteration is tight. That said humans likely iterate fewer times, but each iteration takes a long time.

qeternityalmost 3 years ago

Purely economic take: I’m sure that as knowledge builds over time, people will get more efficient at prompt generation, but the $15 in credits ignores the cost of the time spent to build the final prompt. I wonder how this compares to a junior graphic designer in terms of TCO.

评论 #32434284 未加载

hombre_fatalalmost 3 years ago

Love the stylistic ones. Amazing how it generates such good anime and vaporwave variants, like the neon vaporwave backboard.I ran out of credits way too fast, so I like to see other people playing with it and their iterative process.

qilleralmost 3 years ago

> It’s important to tell DALL·E 2 exactly what you want.Sounds awfully like programming...

BashiBazoukalmost 3 years ago

Is there randomization or will the same prompts produce the same image sets?

评论 #32429487 未加载

srelboalmost 3 years ago

You can also play around for free on a slightly less sophisticated model here <a href="https://art.elbo.ai" rel="nofollow">https://art.elbo.ai</a>

EMIRELADEROalmost 3 years ago

I wonder how this would play out with the new Stable Diffusion

评论 #32430045 未加载

vbezhenaralmost 3 years ago

Is it hard to reimplement that algorithm? I want to see what people would do with porn-enabled image generator. Hopefully pornhub already hiring data scientists.

butzalmost 3 years ago

Serious question: do you actually own the generated image, or copyright is still owned by whoever owns "DALL-E 2"?

zamadatixalmost 3 years ago

I can't wait for access so I can put whacky but oddy relevant images into presentations.

aj7almost 3 years ago

I tried “machining a Siamese cat on the lathe” but with disappointing results.

netfortiusalmost 3 years ago

How could all this play into "flooding" the NFT markets?

评论 #32430471 未加载

评论 #32430971 未加载

评论 #32430400 未加载

评论 #32433452 未加载

joshxyzalmost 3 years ago

thats a lot of llamas playing basketball to see in a day

keepquestioningalmost 3 years ago

DALL-E is truly magic. It got me believing we are close to AGI.I wonder what Gary Marcus or Filip Pieknewski think about it. Surely they must be eating crow.

评论 #32430013 未加载

评论 #32429704 未加载

评论 #32429867 未加载

评论 #32429605 未加载

评论 #32431625 未加载

Taylor_ODalmost 3 years ago

I love this.