Show HN: New AI edits images based on text instructions

1098 pointsby brycedover 2 years ago

This works suprisingly well. Just give it instructions like "make it winter" or "remove the cars" and the photo is altered.Here are some examples of transformations it can make: Golden gate bridge: <a href="https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/gg-bridge-suprise.gif" rel="nofollow">https://raw.githubusercontent.com/brycedrennan/imaginAIry/ma...</a> Girl with a pearl earring: <a href="https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/girl_with_a_pearl_earring_suprise.gif" rel="nofollow">https://raw.githubusercontent.com/brycedrennan/imaginAIry/ma...</a>I integrated this new InstructPix2Pix model into imaginAIry (python library) so it's easy to use for python developers.

53 comments

sandworm101over 2 years ago

Fireworks. These AI tools seem very good at replacing textures, less so about inserting objects. They can all "add fireworks" to a picture. They know what fireworks look like and diligently insert them into "sky" part of pictures. But they don't know that fireworks are large objects far away rather than small objects up close (see the Father Ted bit on that one). So they add tiny fireworks into pictures that don't have a far away portion (portraits) or above distant mountain ridges as if they were stars. Also trees. The AI doesn't know how big trees are and so inserts monster trees under the Golden Gate bridge and tiny bonsais into portraits. Adding objects into complex images is totally hit and miss.

评论 #34487089 未加载

评论 #34485397 未加载

评论 #34483867 未加载

评论 #34535922 未加载

PaulMestover 2 years ago

I've played with several of these Stable Diffusion frameworks and followed many tutorials and imaginAIry fit my workflow the best. I actually wrote Bryce a thank you email in December after I made an advent calendar for my wife. Super excited to see continued development here to make this approachable to people who are familiar with Python, but don't want to deal with a lot of the overhead of building and configuring SD pipelines.

评论 #34474939 未加载

nicbouover 2 years ago

Can it make it pop? Because that was the #1 request I remember dealing with.

评论 #34476192 未加载

评论 #34475730 未加载

评论 #34476163 未加载

评论 #34475511 未加载

评论 #34475568 未加载

评论 #34476227 未加载

Gravynessover 2 years ago

A similar tool: Instruct pix2pix to alter images by describing the changes required: <a href="https://huggingface.co/timbrooks/instruct-pix2pix#example" rel="nofollow">https://huggingface.co/timbrooks/instruct-pix2pix#example</a>Edit: Just noticed it is the same thing but wrapped, nevermind, pretty cool project!

brycedover 2 years ago

Here is a colab you can try it in. It crashed for me the first time but worked the second time. <a href="https://colab.research.google.com/drive/1rOvQNs0Cmn_yU1bKWjCOHzGVDgZkaTtO?usp=sharing" rel="nofollow">https://colab.research.google.com/drive/1rOvQNs0Cmn_yU1bKWjC...</a>

评论 #34488640 未加载

评论 #34482556 未加载

评论 #34476947 未加载

评论 #34480929 未加载

Daubover 2 years ago

The language of high-level art-direction can be way more complex than one might assume. I wonder how this model might cope with the following:‘Decrease high-frequency features of background.’‘Increase intra-contrast of middle ground to foreground.’‘Increase global saturation contrast.’‘Increase hue spread of greens.’

评论 #34477879 未加载

GordonSover 2 years ago

What are the most affordable GPUs that will run this? (it said it needs CUDA, min 11GB VRAM, so I guess my relatively puny 4GB 570RX isn't going to cut it!)

评论 #34477417 未加载

评论 #34476352 未加载

评论 #34476558 未加载

评论 #34476151 未加载

评论 #34477566 未加载

评论 #34482866 未加载

评论 #34493861 未加载

评论 #34502922 未加载

bobmaxupover 2 years ago

<a href="https://www.timothybrooks.com/instruct-pix2pix" rel="nofollow">https://www.timothybrooks.com/instruct-pix2pix</a>

yieldcrvover 2 years ago

“Add a dog in my arms”I’ll keep you posted how well this works for dating apps

sschuellerover 2 years ago

I am not a fan of software such as this putting in an arbitrarily "safety" feature which can only be disabled via undocumented environment variable. At least make it a flag documented for people who don't have an issue with nudity. There isn't even an indication that there is a "safety" issue, you just get a blank image and are wondering if your GPU/model or install is corrupted.This isn't running on a website that is open to everyone or can be easily run by a novice.Anyone capable of installing and running this is also able to read code and remove such a feature. There is no reason to hide this nor to not document it.Also the amount of nudity you get is also highly dependent on which model you use.

评论 #34477434 未加载

评论 #34478755 未加载

评论 #34487848 未加载

评论 #34480725 未加载

评论 #34479670 未加载

social_quotientover 2 years ago

Slightly off topic.I’ve been looking for an easier way to replace the text in these ai generated images. I found Facebook is working on it with their TextStyleBrush - <a href="https://ai.facebook.com/blog/ai-can-now-emulate-text-style-in-images-in-one-shot-using-just-a-single-word/" rel="nofollow">https://ai.facebook.com/blog/ai-can-now-emulate-text-style-i...</a> but have been unable to find something released or usable yet. Anyone aware of other efforts?

评论 #34502839 未加载

TeMPOraLover 2 years ago

> Here are some examples of transformations it can make: Golden gate bridge:I'm on mobile so can't try this myself now. Can it add a Klingon bird of prey flying under the Golden Gate Bridge, and will "add a Klingon bird of prey flying under the Golden Gate Bridge" prompt/command be enough?

评论 #34481396 未加载

anigbrowlover 2 years ago

A CUDA supported graphics card with >= 11gb VRAM (and CUDA installed) or an M1 processor./Sighs in Intel iMacHas anyone managed to get an eGPU running under MacOS? I guess I could use Colab but I like the feeling of running things locally.

评论 #34484170 未加载

perfoptover 2 years ago

How does this work? When I run it on a machine with a GPU (pytorch, CUDA etc installed) I still see it downloading files for each prompt. Is the image being generated on the cloud somewhere or on my local machine? Why the downloads?

评论 #34476109 未加载

WesolyKubeczekover 2 years ago

So, Deckard can ask it to enhance, finally :)

c7bover 2 years ago

Many thanks to the OP, can't wait to try this out! I have a question I'm hoping to slide in here: I remember there were also solutions for doing things like "take this character and now make it do various things". Does anyone remember what the general term for that was, and some solutions (pretty sure I've seen this on here, apparently forgot to bookmark).PS: I'm not trying to make a comic book, I'm trying to help a friend solve a far more basic business problem (trying to get clients to pay their bills on time).

评论 #34476134 未加载

zepearlover 2 years ago

Thanks a lot!!!Works perfectly for me (Gentoo Linux + nVidia RTX3060 12GiB VRAM - I installed last week your package and it just worked, experimenting with it since then, telling about it parents & colleagues).The results (especially in relation to "people's faces") can vary a lot between ok/scary/great (I still have to understand how the options/parameters work), all in all it's a great package that's easy to handle & use.In general, if I don't specify a higher output resolution setting than the default (512x386 or something similar), with e.g. "-w 1024 -h 768", then faces get garbled/deformed like straight from a Stephen King novel => is this expected?Cheers :)

karim79over 2 years ago

I've been toying with SD for a while, and I do want to make a nice and clean business out of it. It's more of a side-projecty thing so to speak.Our "cluster" is running on a ASUS ROG 2080Ti external GPU in the razer core-x housing, and that actually works just fine in my flat.We went through several iterations of how this could work at scale. The initial premise was basically the google homepage, but for images.That's when we realised that scaling this to serve the planet was going go be a hell of a lot more work. But not really, conceptualising the concurrent compute requirements as well as the ever-changing landscape and pace of innovation in this absolutely necessary.The quick fix is to use a message queue (we're using Bull) and make everything asynchronous.So essentially, we solved the scaling factor using just one GPU. You'll get your requested image, but it's in a queue, we'll let you know when it's done. With that compute model in place, we can just add more GPUs, and tickets will take less time to serve if the scale engineering is proper.I'm no expert on GPU/Machine learning/GAN stuff but Stable Diffusion actually prompted me to imagine how to build and scale such a service, and I did so. It is not live yet, but when it does become so the name reserved is dreamcreator dot ai, and I can't say when it will be animated. Hopefully this year.

评论 #34483955 未加载

评论 #34487422 未加载

dandigangiover 2 years ago

This is really cool. Haven't seen something like this yet. Going to be very interesting when you start to see E2E generation => animation/video/static => post editing => repeat. Have this feeling that movie studios are going to look into this kind of stuff. We went from real to CGI and this could take it to new levels in cost savings or possibilities.

评论 #34482949 未加载

sebastiennightover 2 years ago

It's very interesting, thanks! I've noticed (on the Spock example) that "make him smile" didn't produce a very... "comely" result (he basically becomes a vampire).I was thinking of deploying something like that in one of our app features, but I'm scared of making our Users look like vampires :-)Is it your experience that the model struggles more with faces than with other changes?

评论 #34476121 未加载

goffiover 2 years ago

Wow that's really impressive (I've seen similar things in research papers for a while now, but having it usable so easily and generic is great).A few questions:- would it be possible to use this tool to make automatic mask for editing in something like GIMP (for instance, if I want to automatically mask the hair)?- would it be possible to have a REPL or something else to make several prompt on the same image? Loading the model takes time, and it would be great to be able to just do it once.- how about a small GUI or webui to have the preview immediately? Maybe it's not the goal of this project and using `instruct-pix2pix` directly with its webui is more appropriate?Thanks for the work (including upstream people for the research paper and pix2pix), and for sharing.

评论 #34488637 未加载

airbreatherover 2 years ago

I'm getting mixed results, and for a given topic it seems to invariably give a better result first time you ask, then not so good if you ask again.It could be random and my imagination, but seems that way.

nmstokerover 2 years ago

Looks really interesting, although my immediate thought with "-fix-faces" is how long before someone manages to do something inappropriate and whip up a storm about this.

nullish_signalover 2 years ago

>11GB VRAMAaarrrgghh let me know when it's down to 4GB like Stable DiffusionThe prompt-based masking sounds incredible, with either pixel +/- or Prompt Relevance +/-VERY impressive img2img capabilities!

评论 #34474396 未加载

评论 #34475592 未加载

评论 #34475152 未加载

googieover 2 years ago

How to make it use my GPU (I have RTX 3070)? It complains about using sloooow CPU, but I don't see option to switch to GPU, which I think should be sufficient...? I'm running it on Windows 10.

评论 #34548025 未加载

theususover 2 years ago

Two things1. It actually makes me insecure.2. Don't we already have apps that do such things? Yes, they were more specialized, but it's the same thing as Prisma app.

ilakshover 2 years ago

Does anyone know if there is something like Google Cloud for GPUs but with an easy way to suspend the VM or container when not in use? Maybe I am just looking for container hosting with GPUs.I am just trying to avoid some of the basic VM admin stuff like creating, starting, stopping for SaaS if someone already has a way to do. Maybe this is something like what Elastic Beanstalk does.

评论 #34475109 未加载

评论 #34474760 未加载

评论 #34475560 未加载

评论 #34475373 未加载

cbeachover 2 years ago

I see it's able to generate politician faces. I recall this wasn't possible on DALL·E 2 due to safety restrictions.I run a friendly caption contest <a href="https://caption.me" rel="nofollow">https://caption.me</a> so imaginAIry is going to be absolute gold for generating funny and topical content. Thank you @bryced!

sam1rover 2 years ago

This is amazing! It’s only so long until video..

评论 #34476856 未加载

评论 #34478106 未加载

fassssstover 2 years ago

Related: <a href="https://www.reddit.com/r/StableDiffusion/comments/10hv160/image_editing_with_just_text_prompt_new/" rel="nofollow">https://www.reddit.com/r/StableDiffusion/comments/10hv160/im...</a>

kewpover 2 years ago

anyone know how to use this? kind of confusing install instructions in the readme

评论 #34480262 未加载

评论 #34476219 未加载

petrusnoniusover 2 years ago

Are you telling me I can finally ENHANCE!?Great stuff man, thanks!

Der_Einzigeover 2 years ago

Hoping that this is quickly implemented into the automatic1111 webUI.

mstadeover 2 years ago

Does anyone know of any tool like this for UI design? I'd love something that'd help creatively impaired people like myself communicate more visually.

odedbendover 2 years ago

Where can I find more data about the work you did to create this?

评论 #34476114 未加载

pfd1986over 2 years ago

Super nice. Would this work if I have my own version of fine-tuned SD? Also, curious how / whether this is different from img2img released by SD. Thanks!

评论 #34478840 未加载

0x4164over 2 years ago

I hope there is a James Fridman version of this kind of AI.

sideshowbover 2 years ago

Is there a link to how this works - in terms of nn architecture to combine the embedding of the existing image with the edit instruction?

评论 #34476129 未加载

TekMolover 2 years ago

How can I try this?Can this be run on a Digitalocean VM?I looked around on DO's products, but none seems to advertise that it has a GPU. So maybe it is not possible?

评论 #34475096 未加载

评论 #34476318 未加载

tomrodover 2 years ago

This is a lot of fun!And they aren't kidding that on a CPU backend it is slooooow :)

lou_alcalaover 2 years ago

Wow this is cool I think I am going to make a site so people can use this

xwdvover 2 years ago

How about “fix the hands”?

评论 #34477746 未加载

评论 #34480077 未加载

lightbulbishover 2 years ago

This is cool! Makes me want to pull the trigger on an M2

fatih-erikliover 2 years ago

Garbage.

weakwireover 2 years ago

Enchance!

testtwtttttover 2 years ago

how about telling cars where to go ?

testtwtttttover 2 years ago

Uehrekaover 2 years ago

It's a little premature, fine, but I want to start liquidating my rhetorical swaps here: I've been saying since last summer (sometimes on HN, sometimes elsewhere) that "prompt engineering" is BS and that in a world where AI gets better and better, expecting to develop lasting competency in an area of AI-adjacent performance (a.k.a. telling an AI what to do in exactly the right way to get the right result) is akin to expecting to develop a long-lasting business around hand-cranking people's cars for them when they fail to start.Like, come on. We're now seeing AIs take on tasks many people thought would never be doable by machine. And granted, many people (myself included to some extent) have adjusted their priors properly. And yet so many people act like AI is going to stall in its current lane and leave room for human work as opposed to developing orders of magnitudes better intelligence and obliterating all of its current flaws.

评论 #34475510 未加载

评论 #34476019 未加载

评论 #34476144 未加载

评论 #34475516 未加载

评论 #34475311 未加载

评论 #34479033 未加载

评论 #34475414 未加载

评论 #34475349 未加载

natchover 2 years ago

The headline and the heavy promotional verbiage on the site seems to be claiming this is some new functionality we didn’t have before. Image2image with text instructions isn’t new as the headline implies.InvokeAI (and a few other projects as well) already does all this stuff much better unless I’m missing something. There are plenty of stable diffusion wrappers. Why not help improve them instead of copying them?I’m not against having enthusiasm for one’s project, but tell us why this is different and please don’t pretend the other projects don’t have this stuff.

评论 #34475712 未加载

distantsoundsover 2 years ago

If only Stable Diffusion wasn't already populated with a host of copyrighted images already.Make your own art, dammit. This is the equivalent running some Photoshop filters through someone else's work.

kumarmover 2 years ago

Doesn't work if any people are in the photos: <a href="https://twitter.com/kumardexati/status/1616972740728356867/photo/1" rel="nofollow">https://twitter.com/kumardexati/status/1616972740728356867/p...</a>

评论 #34474966 未加载

评论 #34474935 未加载

评论 #34476185 未加载

sfpotterover 2 years ago

These look awful! They are very displeasing aesthetically. They look like they were done by someone with absolutely no artistic ability. Clearly there is some technical interest here, but I just felt the need to point out the elephant in the room. They are very ugly.

评论 #34477911 未加载

评论 #34478066 未加载

88stacksover 2 years ago

Wow, it's really impressive to see how advanced AI image generators have become! The ability to create stable diffusion images with a "just works" approach on multiple operating systems is a huge step forward in this technology. We've deployed similar tech and APIs for our customers and are contemplating using this library as part of our pipeline for <a href="https://88stacks.com" rel="nofollow">https://88stacks.com</a>

评论 #34477261 未加载

53 comments

sandworm101over 2 years ago

评论 #34487089 未加载

评论 #34485397 未加载

评论 #34483867 未加载

评论 #34535922 未加载

PaulMestover 2 years ago

评论 #34474939 未加载

nicbouover 2 years ago

Can it make it pop? Because that was the #1 request I remember dealing with.

评论 #34476192 未加载

评论 #34475730 未加载

评论 #34476163 未加载

评论 #34475511 未加载

评论 #34475568 未加载

评论 #34476227 未加载

Gravynessover 2 years ago

brycedover 2 years ago

评论 #34488640 未加载

评论 #34482556 未加载

评论 #34476947 未加载

评论 #34480929 未加载

Daubover 2 years ago

评论 #34477879 未加载

GordonSover 2 years ago

What are the most affordable GPUs that will run this? (it said it needs CUDA, min 11GB VRAM, so I guess my relatively puny 4GB 570RX isn't going to cut it!)

评论 #34477417 未加载

评论 #34476352 未加载

评论 #34476558 未加载

评论 #34476151 未加载

评论 #34477566 未加载

评论 #34482866 未加载

评论 #34493861 未加载

评论 #34502922 未加载

bobmaxupover 2 years ago

<a href="https://www.timothybrooks.com/instruct-pix2pix" rel="nofollow">https://www.timothybrooks.com/instruct-pix2pix</a>

yieldcrvover 2 years ago

“Add a dog in my arms”I’ll keep you posted how well this works for dating apps

sschuellerover 2 years ago

评论 #34477434 未加载

评论 #34478755 未加载

评论 #34487848 未加载

评论 #34480725 未加载

评论 #34479670 未加载

social_quotientover 2 years ago

评论 #34502839 未加载

TeMPOraLover 2 years ago

评论 #34481396 未加载

anigbrowlover 2 years ago

评论 #34484170 未加载

perfoptover 2 years ago

评论 #34476109 未加载

WesolyKubeczekover 2 years ago

So, Deckard can ask it to enhance, finally :)

c7bover 2 years ago

评论 #34476134 未加载

zepearlover 2 years ago

karim79over 2 years ago

评论 #34483955 未加载

评论 #34487422 未加载

dandigangiover 2 years ago

评论 #34482949 未加载

sebastiennightover 2 years ago

评论 #34476121 未加载

goffiover 2 years ago

评论 #34488637 未加载

airbreatherover 2 years ago

nmstokerover 2 years ago

Looks really interesting, although my immediate thought with "-fix-faces" is how long before someone manages to do something inappropriate and whip up a storm about this.

nullish_signalover 2 years ago

评论 #34474396 未加载

评论 #34475592 未加载

评论 #34475152 未加载

googieover 2 years ago

How to make it use my GPU (I have RTX 3070)? It complains about using sloooow CPU, but I don't see option to switch to GPU, which I think should be sufficient...? I'm running it on Windows 10.

评论 #34548025 未加载

theususover 2 years ago

Two things1. It actually makes me insecure.2. Don't we already have apps that do such things? Yes, they were more specialized, but it's the same thing as Prisma app.

ilakshover 2 years ago

评论 #34475109 未加载

评论 #34474760 未加载

评论 #34475560 未加载

评论 #34475373 未加载

cbeachover 2 years ago

sam1rover 2 years ago

This is amazing! It’s only so long until video..

评论 #34476856 未加载

评论 #34478106 未加载

fassssstover 2 years ago

kewpover 2 years ago

anyone know how to use this? kind of confusing install instructions in the readme

评论 #34480262 未加载

评论 #34476219 未加载

petrusnoniusover 2 years ago

Are you telling me I can finally ENHANCE!?Great stuff man, thanks!

Der_Einzigeover 2 years ago

Hoping that this is quickly implemented into the automatic1111 webUI.

mstadeover 2 years ago

Does anyone know of any tool like this for UI design? I'd love something that'd help creatively impaired people like myself communicate more visually.

odedbendover 2 years ago

Where can I find more data about the work you did to create this?

评论 #34476114 未加载

pfd1986over 2 years ago

Super nice. Would this work if I have my own version of fine-tuned SD? Also, curious how / whether this is different from img2img released by SD. Thanks!

评论 #34478840 未加载

0x4164over 2 years ago

I hope there is a James Fridman version of this kind of AI.

sideshowbover 2 years ago

Is there a link to how this works - in terms of nn architecture to combine the embedding of the existing image with the edit instruction?

评论 #34476129 未加载

TekMolover 2 years ago

How can I try this?Can this be run on a Digitalocean VM?I looked around on DO's products, but none seems to advertise that it has a GPU. So maybe it is not possible?

评论 #34475096 未加载

评论 #34476318 未加载

tomrodover 2 years ago

This is a lot of fun!And they aren't kidding that on a CPU backend it is slooooow :)