This works suprisingly well. Just give it instructions like "make it winter" or "remove the cars" and the photo is altered.<p>Here are some examples of transformations it can make:
Golden gate bridge: <a href="https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/gg-bridge-suprise.gif" rel="nofollow">https://raw.githubusercontent.com/brycedrennan/imaginAIry/ma...</a>
Girl with a pearl earring: <a href="https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/girl_with_a_pearl_earring_suprise.gif" rel="nofollow">https://raw.githubusercontent.com/brycedrennan/imaginAIry/ma...</a><p>I integrated this new InstructPix2Pix model into imaginAIry (python library) so it's easy to use for python developers.
Fireworks. These AI tools seem very good at replacing textures, less so about inserting objects. They can all "add fireworks" to a picture. They know what fireworks look like and diligently insert them into "sky" part of pictures. But they don't know that fireworks are large objects far away rather than small objects up close (see the Father Ted bit on that one). So they add tiny fireworks into pictures that don't have a far away portion (portraits) or above distant mountain ridges as if they were stars. Also trees. The AI doesn't know how big trees are and so inserts monster trees under the Golden Gate bridge and tiny bonsais into portraits. Adding objects into complex images is totally hit and miss.
I've played with several of these Stable Diffusion frameworks and followed many tutorials and imaginAIry fit my workflow the best. I actually wrote Bryce a thank you email in December after I made an advent calendar for my wife. Super excited to see continued development here to make this approachable to people who are familiar with Python, but don't want to deal with a lot of the overhead of building and configuring SD pipelines.
A similar tool: Instruct pix2pix to alter images by describing the changes required: <a href="https://huggingface.co/timbrooks/instruct-pix2pix#example" rel="nofollow">https://huggingface.co/timbrooks/instruct-pix2pix#example</a><p>Edit: Just noticed it is the same thing but wrapped, nevermind, pretty cool project!
Here is a colab you can try it in. It crashed for me the first time but worked the second time.
<a href="https://colab.research.google.com/drive/1rOvQNs0Cmn_yU1bKWjCOHzGVDgZkaTtO?usp=sharing" rel="nofollow">https://colab.research.google.com/drive/1rOvQNs0Cmn_yU1bKWjC...</a>
The language of high-level art-direction can be way more complex than one might assume. I wonder how this model might cope with the following:<p>‘Decrease high-frequency features of background.’<p>‘Increase intra-contrast of middle ground to foreground.’<p>‘Increase global saturation contrast.’<p>‘Increase hue spread of greens.’
What are the most affordable GPUs that will run this? (it said it needs CUDA, min 11GB VRAM, so I guess my relatively puny 4GB 570RX isn't going to cut it!)
I am not a fan of software such as this putting in an arbitrarily "safety" feature which can only be disabled via undocumented environment variable. At least make it a flag documented for people who don't have an issue with nudity. There isn't even an indication that there is a "safety" issue, you just get a blank image and are wondering if your GPU/model or install is corrupted.<p>This isn't running on a website that is open to everyone or can be easily run by a novice.<p>Anyone capable of installing and running this is also able to read code and remove such a feature. There is no reason to hide this nor to not document it.<p>Also the amount of nudity you get is also highly dependent on which model you use.
Slightly off topic.<p>I’ve been looking for an easier way to replace the text in these ai generated images. I found Facebook is working on it with their TextStyleBrush - <a href="https://ai.facebook.com/blog/ai-can-now-emulate-text-style-in-images-in-one-shot-using-just-a-single-word/" rel="nofollow">https://ai.facebook.com/blog/ai-can-now-emulate-text-style-i...</a> but have been unable to find something released or usable yet. Anyone aware of other efforts?
> <i>Here are some examples of transformations it can make: Golden gate bridge:</i><p>I'm on mobile so can't try this myself now. Can it add a Klingon bird of prey flying under the Golden Gate Bridge, and will "add a Klingon bird of prey flying under the Golden Gate Bridge" prompt/command be enough?
<i>A CUDA supported graphics card with >= 11gb VRAM (and CUDA installed) or an M1 processor.</i><p>/Sighs in Intel iMac<p>Has anyone managed to get an eGPU running under MacOS? I guess I could use Colab but I like the feeling of running things locally.
How does this work? When I run it on a machine with a GPU (pytorch, CUDA etc installed) I still see it downloading files for each prompt. Is the image being generated on the cloud somewhere or on my local machine? Why the downloads?
Many thanks to the OP, can't wait to try this out! I have a question I'm hoping to slide in here: I remember there were also solutions for doing things like "take this character and now make it do various things". Does anyone remember what the general term for that was, and some solutions (pretty sure I've seen this on here, apparently forgot to bookmark).<p>PS: I'm not trying to make a comic book, I'm trying to help a friend solve a far more basic business problem (trying to get clients to pay their bills on time).
Thanks a lot!!!<p>Works perfectly for me (Gentoo Linux + nVidia RTX3060 12GiB VRAM - I installed last week your package and it just worked, experimenting with it since then, telling about it parents & colleagues).<p>The results (especially in relation to "people's faces") can vary a lot between ok/scary/great (I still have to understand how the options/parameters work), all in all it's a great package that's easy to handle & use.<p>In general, if I don't specify a higher output resolution setting than the default (512x386 or something similar), with e.g. "-w 1024 -h 768", then faces get garbled/deformed like straight from a Stephen King novel => is this expected?<p>Cheers :)
I've been toying with SD for a while, and I do want to make a nice and clean business out of it. It's more of a side-projecty thing so to speak.<p>Our "cluster" is running on a ASUS ROG 2080Ti external GPU in the razer core-x housing, and that actually works just fine in my flat.<p>We went through several iterations of how this could work at scale. The initial premise was basically the google homepage, but for images.<p>That's when we realised that scaling this to serve the planet was going go be a hell of a lot more work. But not really, conceptualising the concurrent compute requirements as well as the ever-changing landscape and pace of innovation in this absolutely necessary.<p>The quick fix is to use a message queue (we're using Bull) and make everything asynchronous.<p>So essentially, we solved the scaling factor using just one GPU. You'll get your requested image, but it's in a queue, we'll let you know when it's done. With that compute model in place, we can just add more GPUs, and tickets will take less time to serve if the scale engineering is proper.<p>I'm no expert on GPU/Machine learning/GAN stuff but Stable Diffusion actually prompted me to imagine how to build and scale such a service, and I did so. It is not live yet, but when it does become so the name reserved is dreamcreator dot ai, and I can't say when it will be animated. Hopefully this year.
This is really cool. Haven't seen something like this yet. Going to be very interesting when you start to see E2E generation => animation/video/static => post editing => repeat. Have this feeling that movie studios are going to look into this kind of stuff. We went from real to CGI and this could take it to new levels in cost savings or possibilities.
It's very interesting, thanks!
I've noticed (on the Spock example) that "make him smile" didn't produce a very... "comely" result (he basically becomes a vampire).<p>I was thinking of deploying something like that in one of our app features, but I'm scared of making our Users look like vampires :-)<p>Is it your experience that the model struggles more with faces than with other changes?
Wow that's really impressive (I've seen similar things in research papers for a while now, but having it usable so easily and generic is great).<p>A few questions:<p>- would it be possible to use this tool to make automatic mask for editing in something like GIMP (for instance, if I want to automatically mask the hair)?<p>- would it be possible to have a REPL or something else to make several prompt on the same image? Loading the model takes time, and it would be great to be able to just do it once.<p>- how about a small GUI or webui to have the preview immediately? Maybe it's not the goal of this project and using `instruct-pix2pix` directly with its webui is more appropriate?<p>Thanks for the work (including upstream people for the research paper and pix2pix), and for sharing.
I'm getting mixed results, and for a given topic it seems to invariably give a better result first time you ask, then not so good if you ask again.<p>It could be random and my imagination, but seems that way.
Looks really interesting, although my immediate thought with "-fix-faces" is how long before someone manages to do something inappropriate and whip up a storm about this.
>11GB VRAM<p>Aaarrrgghh let me know when it's down to 4GB like Stable Diffusion<p>The prompt-based masking sounds incredible, with either pixel +/- or Prompt Relevance +/-<p>VERY impressive img2img capabilities!
How to make it use my GPU (I have RTX 3070)? It complains about using sloooow CPU, but I don't see option to switch to GPU, which I think should be sufficient...? I'm running it on Windows 10.
Two things<p>1. It actually makes me insecure.<p>2. Don't we already have apps that do such things? Yes, they were more specialized, but it's the same thing as Prisma app.
Does anyone know if there is something like Google Cloud for GPUs but with an easy way to suspend the VM or container when not in use? Maybe I am just looking for container hosting with GPUs.<p>I am just trying to avoid some of the basic VM admin stuff like creating, starting, stopping for SaaS if someone already has a way to do. Maybe this is something like what Elastic Beanstalk does.
I see it's able to generate politician faces. I recall this wasn't possible on DALL·E 2 due to safety restrictions.<p>I run a friendly caption contest <a href="https://caption.me" rel="nofollow">https://caption.me</a> so imaginAIry is going to be absolute gold for generating funny and topical content. Thank you @bryced!
Does anyone know of any tool like this for UI design? I'd love something that'd help creatively impaired people like myself communicate more visually.
Super nice. Would this work if I have my own version of fine-tuned SD? Also, curious how / whether this is different from img2img released by SD. Thanks!
How can I try this?<p>Can this be run on a Digitalocean VM?<p>I looked around on DO's products, but none seems to advertise that it has a GPU. So maybe it is not possible?
It's a little premature, fine, but I want to start liquidating my rhetorical swaps here: I've been saying since last summer (sometimes on HN, sometimes elsewhere) that "prompt engineering" is BS and that in a world where AI gets better and better, expecting to develop lasting competency in an area of AI-adjacent performance (a.k.a. telling an AI what to do in exactly the right way to get the right result) is akin to expecting to develop a long-lasting business around hand-cranking people's cars for them when they fail to start.<p>Like, come on. We're now seeing AIs take on tasks many people thought would never be doable by machine. And granted, many people (myself included to some extent) have adjusted their priors properly. And yet so many people act like AI is going to stall in its current lane and leave room for human work as opposed to developing orders of magnitudes better intelligence and obliterating all of its current flaws.
The headline and the heavy promotional verbiage on the site seems to be claiming this is some new functionality we didn’t have before. Image2image with text instructions isn’t new as the headline implies.<p>InvokeAI (and a few other projects as well) already does all this stuff much better unless I’m missing something. There are plenty of stable diffusion wrappers. Why not help improve them instead of copying them?<p>I’m not against having enthusiasm for one’s project, but tell us why this is different and please don’t pretend the other projects don’t have this stuff.
If only Stable Diffusion wasn't already populated with a host of copyrighted images already.<p>Make your own art, dammit. This is the equivalent running some Photoshop filters through someone else's work.
Doesn't work if any people are in the photos: <a href="https://twitter.com/kumardexati/status/1616972740728356867/photo/1" rel="nofollow">https://twitter.com/kumardexati/status/1616972740728356867/p...</a>
These look awful! They are very displeasing aesthetically. They look like they were done by someone with absolutely no artistic ability. Clearly there is some technical interest here, but I just felt the need to point out the elephant in the room. They are <i>very ugly</i>.
Wow, it's really impressive to see how advanced AI image generators have become! The ability to create stable diffusion images with a "just works" approach on multiple operating systems is a huge step forward in this technology. We've deployed similar tech and APIs for our customers and are contemplating using this library as part of our pipeline for <a href="https://88stacks.com" rel="nofollow">https://88stacks.com</a>