Show HN: A Dalle-3 and GPT4-Vision feedback loop

587 pointsby z991over 1 year ago

I used to enjoy Translation Party, and over the weekend I realized that we can build the same feedback loop with DALLE-3 and GPT4-Vision. Start with a text prompt, let DALLE-3 generate an image, then GPT-4 Vision turns that image back into a text prompt, DALLE-3 creates another image, and so on.You need to bring your own OpenAI API key (costs about $0.10/run)Some prompts are very stable, others go wild. If you bias GPT4's prompting by telling it to "make it weird" you can get crazy results.Here's a few of my favorites:- Gnomes: <a href="https://dalle.party/?party=k4eeMQ6I" rel="nofollow noreferrer">https://dalle.party/?party=k4eeMQ6I</a>- Start with a sailboat but bias GPT4V to "replace everything with cats": <a href="https://dalle.party/?party=0uKfJjQn" rel="nofollow noreferrer">https://dalle.party/?party=0uKfJjQn</a>- A more stable one (but everyone is always an actor): <a href="https://dalle.party/?party=oxpeZKh5" rel="nofollow noreferrer">https://dalle.party/?party=oxpeZKh5</a>

52 comments

epiccolemanover 1 year ago

It's pretty fun to mess with the prompt and see what you can make happen over the series of images. Inspired by a recent Twitter post[1], I set this one up to increase the "intensity" each time it prompted.The starting prompt (or at least, the theme) was suggested by one of my kids. Watch in awe as a regular goat rampage accelerates into full cosmic horror universe ending madness. Friggin awesome:<a href="https://dalle.party/?party=vCwYT8Em" rel="nofollow noreferrer">https://dalle.party/?party=vCwYT8Em</a>[1]: <a href="https://x.com/venturetwins/status/1728956493024919604?s=20" rel="nofollow noreferrer">https://x.com/venturetwins/status/1728956493024919604?s=20</a>

评论 #38443753 未加载

评论 #38442282 未加载

评论 #38449058 未加载

评论 #38442303 未加载

评论 #38444287 未加载

评论 #38469570 未加载

andrelaszloover 1 year ago

Here's a custom prompt that I enjoyed:"Think hard about every single detail of the image, conceptualize it including the style, colors, and lighting.Final step, condensing this into a single paragraph:Very carefully, condense your thoughts using the most prominent features and extremely precise language into a single paragraph."<a href="https://dalle.party/?party=1lSMniUP" rel="nofollow noreferrer">https://dalle.party/?party=1lSMniUP</a><a href="https://dalle.party/?party=cEUyjzch" rel="nofollow noreferrer">https://dalle.party/?party=cEUyjzch</a><a href="https://dalle.party/?party=14fnkTv-" rel="nofollow noreferrer">https://dalle.party/?party=14fnkTv-</a><a href="https://dalle.party/?party=wstiY-Iw" rel="nofollow noreferrer">https://dalle.party/?party=wstiY-Iw</a>Praise the Basilisk, I finally got rate-limited and can go to bed!

评论 #38441989 未加载

评论 #38441281 未加载

评论 #38441344 未加载

评论 #38449980 未加载

评论 #38441838 未加载

w-mover 1 year ago

Playing with opposites is kind of fun, too.Simply a cat, evolving into a lounging cucumber, and finally opposite world:<a href="https://dalle.party/?party=pqwKQVka" rel="nofollow noreferrer">https://dalle.party/?party=pqwKQVka</a>Vibrant gathering of celestial octopus entities:<a href="https://dalle.party/?party=lHNDUvtp" rel="nofollow noreferrer">https://dalle.party/?party=lHNDUvtp</a>

rbatesover 1 year ago

This reminds me of the party game Telestrations where players go back and forth between drawing and writing what they see. It's hilarious to see the result because you anticipate what the next drawing will be while reading the prompt.I'd love to see an alternative viewing mode here which shows the image and the following prompt. Then you need to click a button to reveal the next image. This allows you to picture in your mind what the image might like while reading the prompt.Thanks for making this fun little app!Update: I just realized you can get this effect by going into mobile mode (or resizing the window). You can then scroll down to see the image after reading the prompt.

评论 #38450641 未加载

Mtinieover 1 year ago

I figured this would quickly go off the rails into surreal territory, but instead it ended up being progressive technological de-evolution.Starting prompt: "A futuristic hybrid of a steam engine train and a DaVinci flying machine"Results: <a href="https://dalle.party/?party=14ESewbz" rel="nofollow noreferrer">https://dalle.party/?party=14ESewbz</a>(Addendum: In case anyone was curious how costs scale by iteration, the full ten iterations in this result billed $0.21 against my credit balance.)

评论 #38437873 未加载

评论 #38441624 未加载

xeckrover 1 year ago

Cool idea! I made one with the starting prompt "an artificial intelligence painting a picture of itself": <a href="https://dalle.party/?party=wszvbrOx" rel="nofollow noreferrer">https://dalle.party/?party=wszvbrOx</a>It consistently shows a robot painting on a canvas. The first 4 are paintings of robots, the next 3 are galaxies, and the final 2 are landscapes.

评论 #38440889 未加载

评论 #38438121 未加载

评论 #38445595 未加载

jsf01over 1 year ago

It’s cool to see how certain prompts and themes stay relatively stable, like the gnome example. But then “cat lecturing mice” quickly goes off the rails into weird surreal sloth banana territory.My best guess to try to explain this would be that “gnome + art style + mushroom” will draw from a lot more concrete examples in the training data, whereas the AI is forced to reach a bit wider to try to concoct some image for the weird scenario given in the cat example.

z991over 1 year ago

Also, descent into Corgi insanity: <a href="https://dalle.party/?party=oxXJE9J4" rel="nofollow noreferrer">https://dalle.party/?party=oxXJE9J4</a>

评论 #38436034 未加载

评论 #38440071 未加载

评论 #38440533 未加载

评论 #38437328 未加载

评论 #38435581 未加载

评论 #38439608 未加载

评论 #38445103 未加载

epivosismover 1 year ago

The "create text version of image" prompt matters a ton.I tried three, demo here:default<pre><code> https://dalle.party/?party=JfiwmJra </code></pre> hyper-long + max detail + compression - This shows that with enough text, it can do a really good job of reproducing very, very similar images<pre><code> https://dalle.party/?party=QtEqq4Mu </code></pre> hyper-long + max detail + compression + telling it to cut all that down to 12 words - This seems okay. I might be losing too much detail<pre><code> https://dalle.party/?party=0utxvJ9y </code></pre> Overall the extreme content filtering and lying error messages are not ideal; will probably improve in the future. If you send too long, or too risky a prompt, or the image it generates is randomly too risky, you either get told about it or lied to that you've hit rate limits. Sometimes you also really do hit ratelimits.Also, you can't raise your rate limits until you prove it by having paid over X amount to openai. This kind of makes sense as a way to prevent new sign-ups from blowing thousands of dollars of cap mistakenly.Hyper detail prompt:Look at this image and extract all the vital elements. List them in your mind including position, style, shape, texture, color, everything else essential to convey their meaning. Now think about the theme of the image and write that down, too. Now write out the composition and organization of the image in terms of placement, size, relationships, focus. Now think about the emotions - what is everyone feeling and thinking and doing towards each other? Now, take all that data and think about a very long, detailed summary including all elements. Then "compress" this data using abbreviations, shortenings, artistic metaphors, references to things which might help others understand it, labels and select pull-quotes. Then add even more detail by reviewing what we reviewed before. Now do one final pass considering the input image again, making sure to include everything from it in the output one, too. Finally, produce a long maximum length jam packed with info details which could be used to perfectly reproduce this image.Final shrink to 12 words:NOW, re-read ALL of that twice, thinking deeply about it, then compress it down to just 12 very carefully chosen words which with infinite precision, poetry, beauty and love contain all the detail, and output them, in quotes.

评论 #38440710 未加载

评论 #38445182 未加载

评论 #38441109 未加载

评论 #38438861 未加载

andrelaszloover 1 year ago

My results are disappoitingly noisy but I love the concept<a href="https://dalle.party/?party=bxrPClVg" rel="nofollow noreferrer">https://dalle.party/?party=bxrPClVg</a><a href="https://dalle.party/?party=mmBxT8G-" rel="nofollow noreferrer">https://dalle.party/?party=mmBxT8G-</a><a href="https://dalle.party/?party=kxra0OKY" rel="nofollow noreferrer">https://dalle.party/?party=kxra0OKY</a> (the last prompt got a content warning)<a href="https://dalle.party/?party=Q8VYXU0_" rel="nofollow noreferrer">https://dalle.party/?party=Q8VYXU0_</a>

评论 #38440619 未加载

nerdponxover 1 year ago

The #1 phenomenon I see here is that the image-to-text model doesn't have any idea what the pictures actually contain. It looks like it's just matching patterns that it has in its training data. That's really interesting because it does a great job of rendering images from text, in a way that maybe suggests the model "understands" what you want it to do. But there's nothing even close to "understanding" going in the other direction, it feels like something from 2012.Pretty interesting. I haven't been following the latest developments in this field (e.g. I have no idea how the DALL-E and GPT models' inputs and outputs are connected). Does this track with known results in the literature, or am I seeing a pattern that's not there?

评论 #38443173 未加载

评论 #38441408 未加载

unclehighbrow1over 1 year ago

Hey, I'm one of the creators of Translation Party, thanks for the shout out, I really like this. My co-creator had the idea to limit the number of words for the generated image description so that more change could happen between iterations. Not sure if that's possible. Anyway, this is really fun, thank you!

oarfishover 1 year ago

I haven't tried this yet, but I assume its similar to a game you can buy commercially as Scrawl [1]. You pass paper in a circle and have to either turn your neighbor's writing into a drawing or vice versa, then pass it on. It's entirely hilarious and probably the most fun game I've ever played.1 <a href="https://boardgamegeek.com/boardgame/202982/scrawl" rel="nofollow noreferrer">https://boardgamegeek.com/boardgame/202982/scrawl</a>

indymikeover 1 year ago

Interesting how similar this is to my family's favorite game: pictograph.1. You start by describing a thing. 2. The next person draws a picture of it. 3. The next next person describes the picture. repeat steps 2 and 3 until everyone has either drawn or described the picture.You then compare the first and last description... and look over the pictures. One of the best ever was:Draw a penguin. The first picture was a penguin with a light shadow.After going around five rounds, the final description was "a pidgeon stabbed with a fork in a pool of blood in Chicago"I'm still trying to figure out how Chicago got in there.

评论 #38439978 未加载

评论 #38441601 未加载

epivosismover 1 year ago

One reason this is good is that the default gpt4-vision UI is so insanely bad and slow. This just lets you use your capacity faster.Rate limits are really low by default - you can get hit by 5 img/min limits, or 100 RPD (requests per day) which I think is actually implemented as requests per hour.This page has info on the rate limits: <a href="https://platform.openai.com/docs/guides/rate-limits/usage-tiers?context=tier-one" rel="nofollow noreferrer">https://platform.openai.com/docs/guides/rate-limits/usage-ti...</a>Basically, you have to have paid X amount to get into a new usage cap. Rate limits for dalle3/images don't go up very fast but it can't hurt to get over the various hurdles (5$, 50$, 100$) as soon as possible for when limits come down. End of the month is coming soon. It looks like most of the "RPD" limits go away when you hit tier 2 (having paid at least 50$ historically via API to them).

superpope99over 1 year ago

Nice! I prototyped a manual version of this a while ago. <a href="https://twitter.com/conradgodfrey/status/1712564282167300226" rel="nofollow noreferrer">https://twitter.com/conradgodfrey/status/1712564282167300226</a>I think the thing that strikes me is that the default for chatGPT and the API is to create images in "vivid" mode. There's some interesting discussion on the differences between the "vivid" and "natural" here <a href="https://cookbook.openai.com/articles/what_is_new_with_dalle_3" rel="nofollow noreferrer">https://cookbook.openai.com/articles/what_is_new_with_dalle_...</a>I think these contribute to the images becoming more surreal - would be interested to compare to natural mode - it looks like you're using vivid mode based on the examples?

rexreedover 1 year ago

Question: how are you protecting those API keys? I'm reluctant to enter mine into what could easily be an API Key scraper.

评论 #38437636 未加载

评论 #38438402 未加载

i-use-nixos-btwover 1 year ago

It’d be interesting to start with an image rather than a prompt, though I am afraid of what it’d do if I started with a selfie.

airstrikeover 1 year ago

This is hilarious, thanks for sharingAt the same time, it perfectly illustrates my main issue with these AI art tools: they very often generate pictures that are interesting to look at while very rarely generating exactly what you want them to.I imagine a study in which participants are asked to create N images of their choosing and rate them from 0-10 on how satisfied they are with the results. One try per image only.Then each participant rates each other's images on how satisfied with the results based on the prompt.It should be clear to participants that nobody wins anything from having the "best rated" images. i.e. in some way we should control for participants not overrating their own creations.I'd wager participants will rate their own creations lower than those made by other participants.

评论 #38440475 未加载

davelondonover 1 year ago

"tiny elephants wearing mouse costumes"?!? <a href="https://dalle.party/?party=42riPROf" rel="nofollow noreferrer">https://dalle.party/?party=42riPROf</a>

AvImdover 1 year ago

Science class with a dark twist: <a href="https://dalle.party/?party=ks3T2mMx" rel="nofollow noreferrer">https://dalle.party/?party=ks3T2mMx</a>

Terrettaover 1 year ago

If you were wondering how to bump up your API rate limits through usage, this is the way.// also, it's the best way - TY @z991

willsmith72over 1 year ago

this is actually really helpful. Since chatgpt restricted dalle to 1 image a few weeks ago, the feedback loops are way slower. This is a nice (but more expensive) alternative

评论 #38435712 未加载

评论 #38437034 未加载

toxic72over 1 year ago

I purposely gave it some weird instructions to show the progress of the universe from the Big Bang to present day Earth. It showed the 8 stages from my prompt in each image and started to iterate over it, and then on image four I got a 400 error: Error: 400 Your request was rejected as a result of our safety system. Your prompt may contain text that is not allowed by our safety system. Interesting.<a href="https://dalle.party/?party=EdpKnnBC" rel="nofollow noreferrer">https://dalle.party/?party=EdpKnnBC</a>

评论 #38461184 未加载

mythzover 1 year ago

"Earth going through cycles of creation and destruction"<a href="https://dalle.party/?party=KvmW7Zwv" rel="nofollow noreferrer">https://dalle.party/?party=KvmW7Zwv</a>

juanuicichover 1 year ago

There seems to be a bug, when you click “Keep going” it regenerates the GPT4V text, even though that was there already. The next step should be to generate an image.

comboyover 1 year ago

It goes against my intuition that many prompts are so stable.

epivosismover 1 year ago

You can really "cheat" by modifying the custom prompt to re-insert or remove specific features. For example, "generate a prompt for this image but adjust it by making everything appear in a more primitive, earlier evolutionary form, or in an earlier less developed way" would make things de-evolve.Or you can just re-insert any theme or recurring characters you like at that stage.

oyster143over 1 year ago

I did smth similar but took real famous photos as a seed. The results are quite curious and seem to tell a bit about the difference between real world and dalle/chatgpt style.<a href="https://twitter.com/avkh143/status/1713285785888120985" rel="nofollow noreferrer">https://twitter.com/avkh143/status/1713285785888120985</a>

atleastoptimalover 1 year ago

It would be interesting to add a constant modifier/amplifier to each cycle, like making each description more floral, robotic, favoring a certain style each time so we can trace the evolution, or perhaps having the prompt describe the previous image via a certain lens like "describe what was happening immediately before that led to this image"

dash2over 1 year ago

The endpoint of the evolution always seems to be a poster on the bedroom of a teenager who likes to smoke weed. I wonder why!

smusamashahover 1 year ago

Why do prompts from GPT-4V start from "Create an image of"? This prefix doesn't look useful imo.

评论 #38436234 未加载

swyxover 1 year ago

OP's last one is interesting: <a href="https://dalle.party/?party=oxpeZKh5" rel="nofollow noreferrer">https://dalle.party/?party=oxpeZKh5</a> because it shows GPT4V and Dalle3 being remarkably race-blind. i wonder if you can prompt it to be other wise...

评论 #38438991 未加载

评论 #38444363 未加载

ThomPeteover 1 year ago

It's quite fun to do these loops.Here is using Faktory to do the same.<a href="https://www.loom.com/share/ed20b2cace3b4f579e32ef08bd1c5910" rel="nofollow noreferrer">https://www.loom.com/share/ed20b2cace3b4f579e32ef08bd1c5910</a>

einpoklumover 1 year ago

It seemed that, after a few iterations, GPT-4 lost its cool and blurted out it thinks DALL-E generates ugly sweaters:> Create a cozy and warm Christmas scene with a diverse group of friends wearing colorful ugly sweaters.

neuronexmachinaover 1 year ago

Very cool, I'm rather curious how many iterations it would typically take for a feedback loop to converge on a stable fixed-point. I also wonder if the fixed points tend to be singular or elliptic.

Kiroover 1 year ago

This was the first thing I (and I presume many others) tried when GPT4-V was released, by copypasting between two ChatGPT windows. I've been waiting for someone to make an app out of it. Good job!

dpflanover 1 year ago

Interesting, how stable are the images for a given prompt? And the other way around? Does it trend toward some natural limit image/text where there are diminishing returns to making change to the data?

bbreierover 1 year ago

I'd like to be able to begin it with an image rather than a prompt.

hamilyon2over 1 year ago

Interesting how the image series tend to gravitate toward mushrooms

edfletcher_t137over 1 year ago

A clever idea that I'd love to play around with, but not without a source link so I could feel better about trusting it and host it myself.

fassssstover 1 year ago

I would never paste my API key into an app or website.

评论 #38438276 未加载

评论 #38438235 未加载

blopkerover 1 year ago

This is fun, thanks for sharing! It would be interesting to upload the initial image from a camera to see where the chain takes it.

AvImdover 1 year ago

The default limit for an account that was not used much is one image per minute, can you please add support for timeouts?

评论 #38440850 未加载

m3kw9over 1 year ago

Don’t get the significance, anyone one of those guys images could have been prompted the first time

评论 #38441504 未加载

robblbobblover 1 year ago

Pretty interesting. I would love to see a version of this running locally with local models.

RayVRover 1 year ago

strange to me how many of these eventually turn into steampunk.

brianf0over 1 year ago

Does anyone else experience a physical reaction to AI generated art that resembles repulsion and disgust? Something about it just feels “wrong”. Something I can compare it to is the feeling of unexpectedly seeing an extremely moldy thing in your fridge. It feels alive and invasive in an inhuman and horrifying way.

willsmith72over 1 year ago

it seems like if you create a shareable link, then add more images, you can't create a new link with the new images

评论 #38436219 未加载

3abitonover 1 year ago

This a curious case of compression?

cyanydeezover 1 year ago

need to throw in a Google to Google to Google language translate to get some more variety

评论 #38441982 未加载

kwelstrover 1 year ago

Bad art is always depressing :( Edit: I mean, I am an artist and I've been using AI for some ideas and maybe from one in a hundred tries I hit something almost good. The rest of the time it's the same shallow fantastically cheese type of variations.

52 comments

epiccolemanover 1 year ago

评论 #38443753 未加载

评论 #38442282 未加载

评论 #38449058 未加载

评论 #38442303 未加载

评论 #38444287 未加载

评论 #38469570 未加载

andrelaszloover 1 year ago

评论 #38441989 未加载

评论 #38441281 未加载

评论 #38441344 未加载

评论 #38449980 未加载

评论 #38441838 未加载

w-mover 1 year ago

rbatesover 1 year ago

评论 #38450641 未加载

Mtinieover 1 year ago

评论 #38437873 未加载

评论 #38441624 未加载

xeckrover 1 year ago

评论 #38440889 未加载

评论 #38438121 未加载

评论 #38445595 未加载

jsf01over 1 year ago

z991over 1 year ago

Also, descent into Corgi insanity: <a href="https://dalle.party/?party=oxXJE9J4" rel="nofollow noreferrer">https://dalle.party/?party=oxXJE9J4</a>

评论 #38436034 未加载

评论 #38440071 未加载

评论 #38440533 未加载

评论 #38437328 未加载

评论 #38435581 未加载

评论 #38439608 未加载

评论 #38445103 未加载

epivosismover 1 year ago

评论 #38440710 未加载

评论 #38445182 未加载

评论 #38441109 未加载

评论 #38438861 未加载

andrelaszloover 1 year ago

评论 #38440619 未加载

nerdponxover 1 year ago

评论 #38443173 未加载

评论 #38441408 未加载

unclehighbrow1over 1 year ago

oarfishover 1 year ago

indymikeover 1 year ago

评论 #38439978 未加载

评论 #38441601 未加载

epivosismover 1 year ago

superpope99over 1 year ago

rexreedover 1 year ago

Question: how are you protecting those API keys? I'm reluctant to enter mine into what could easily be an API Key scraper.

评论 #38437636 未加载

评论 #38438402 未加载

i-use-nixos-btwover 1 year ago

It’d be interesting to start with an image rather than a prompt, though I am afraid of what it’d do if I started with a selfie.

airstrikeover 1 year ago

评论 #38440475 未加载

davelondonover 1 year ago

"tiny elephants wearing mouse costumes"?!? <a href="https://dalle.party/?party=42riPROf" rel="nofollow noreferrer">https://dalle.party/?party=42riPROf</a>

AvImdover 1 year ago

Science class with a dark twist: <a href="https://dalle.party/?party=ks3T2mMx" rel="nofollow noreferrer">https://dalle.party/?party=ks3T2mMx</a>

Terrettaover 1 year ago

If you were wondering how to bump up your API rate limits through usage, this is the way.// also, it's the best way - TY @z991

willsmith72over 1 year ago

this is actually really helpful. Since chatgpt restricted dalle to 1 image a few weeks ago, the feedback loops are way slower. This is a nice (but more expensive) alternative

评论 #38435712 未加载

评论 #38437034 未加载

toxic72over 1 year ago

评论 #38461184 未加载

mythzover 1 year ago

"Earth going through cycles of creation and destruction"<a href="https://dalle.party/?party=KvmW7Zwv" rel="nofollow noreferrer">https://dalle.party/?party=KvmW7Zwv</a>

juanuicichover 1 year ago

There seems to be a bug, when you click “Keep going” it regenerates the GPT4V text, even though that was there already. The next step should be to generate an image.

comboyover 1 year ago

It goes against my intuition that many prompts are so stable.

epivosismover 1 year ago

oyster143over 1 year ago

atleastoptimalover 1 year ago

dash2over 1 year ago

The endpoint of the evolution always seems to be a poster on the bedroom of a teenager who likes to smoke weed. I wonder why!

smusamashahover 1 year ago

Why do prompts from GPT-4V start from "Create an image of"? This prefix doesn't look useful imo.

评论 #38436234 未加载

swyxover 1 year ago

评论 #38438991 未加载

评论 #38444363 未加载

ThomPeteover 1 year ago

einpoklumover 1 year ago

neuronexmachinaover 1 year ago

Very cool, I'm rather curious how many iterations it would typically take for a feedback loop to converge on a stable fixed-point. I also wonder if the fixed points tend to be singular or elliptic.

Kiroover 1 year ago

This was the first thing I (and I presume many others) tried when GPT4-V was released, by copypasting between two ChatGPT windows. I've been waiting for someone to make an app out of it. Good job!

dpflanover 1 year ago

bbreierover 1 year ago

I'd like to be able to begin it with an image rather than a prompt.

hamilyon2over 1 year ago

Interesting how the image series tend to gravitate toward mushrooms

edfletcher_t137over 1 year ago

A clever idea that I'd love to play around with, but not without a source link so I could feel better about trusting it and host it myself.

fassssstover 1 year ago

I would never paste my API key into an app or website.

评论 #38438276 未加载

评论 #38438235 未加载

blopkerover 1 year ago

This is fun, thanks for sharing! It would be interesting to upload the initial image from a camera to see where the chain takes it.

AvImdover 1 year ago

The default limit for an account that was not used much is one image per minute, can you please add support for timeouts?

评论 #38440850 未加载

m3kw9over 1 year ago

Don’t get the significance, anyone one of those guys images could have been prompted the first time

评论 #38441504 未加载

robblbobblover 1 year ago

Pretty interesting. I would love to see a version of this running locally with local models.

RayVRover 1 year ago

strange to me how many of these eventually turn into steampunk.

brianf0over 1 year ago

willsmith72over 1 year ago

it seems like if you create a shareable link, then add more images, you can't create a new link with the new images

评论 #38436219 未加载

3abitonover 1 year ago

This a curious case of compression?

cyanydeezover 1 year ago

need to throw in a Google to Google to Google language translate to get some more variety

评论 #38441982 未加载

kwelstrover 1 year ago