DeepFloyd IF: open-source text-to-image model

259 pointsby ea016about 2 years ago

37 comments

dangabout 2 years ago

Related: <a href="https://stability.ai/blog/deepfloyd-if-text-to-image-model" rel="nofollow">https://stability.ai/blog/deepfloyd-if-text-to-image-model</a>(via <a href="https://news.ycombinator.com/item?id=35743727" rel="nofollow">https://news.ycombinator.com/item?id=35743727</a>, but we've merged that thread into this earlier one)

minimaxirabout 2 years ago

GitHub: <a href="https://github.com/deep-floyd/IF">https://github.com/deep-floyd/IF</a>Colab Notebook for running the model based on the diffusers library: <a href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/deepfloyd_if_free_tier_google_colab.ipynb" rel="nofollow">https://colab.research.google.com/github/huggingface/noteboo...</a>Hugging Face Space for testing the model: <a href="https://huggingface.co/spaces/DeepFloyd/IF" rel="nofollow">https://huggingface.co/spaces/DeepFloyd/IF</a>Note that the model is substantially more compute-intensive than Stable Diffusion, so it may be slower even though that space is running on an A100.

评论 #35744427 未加载

评论 #35744151 未加载

评论 #35744407 未加载

评论 #35744375 未加载

epivosismabout 2 years ago

Example of how much better it can do compared to midjourney, on a complex prompt: <a href="https://twitter.com/eb_french/status/1623823175170805760" rel="nofollow">https://twitter.com/eb_french/status/1623823175170805760</a>It is able to put people on the left/right and put the correct t-shirts and facial expressions on each one. This is compared to mj which just mixes together a soup of every word you use and plops it out into the image. Huge MJ fan of course, it's amazing, but having compositional power is another step up.

评论 #35721830 未加载

评论 #35725351 未加载

评论 #35723282 未加载

评论 #35721182 未加载

评论 #35721110 未加载

lalaithionabout 2 years ago

Has anyone tried the Scott Alexander AI bet prompts?1. A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth2. An oil painting of a man in a factory looking at a cat wearing a top hat3. A digital art picture of a child riding a llama with a bell on its tail through a desert4. A 3D render of an astronaut in space holding a fox wearing lipstick5. Pixel art of a farmer in a cathedral holding a red basketball

评论 #35721071 未加载

评论 #35720987 未加载

hunkinsabout 2 years ago

New restriction in their License suggests the software can't be modified."2. All persons obtaining a copy or substantial portion of the Software, a modified version of the Software (or substantial portion thereof), or a derivative work based upon this Software (or substantial portion thereof) must not delete, remove, disable, diminish, or circumvent any inference filters or inference filter mechanisms in the Software, or any portion of the Software that implements any such filters or filter mechanisms."

评论 #35719907 未加载

评论 #35719382 未加载

评论 #35718961 未加载

评论 #35719228 未加载

评论 #35718952 未加载

orraabout 2 years ago

Neither the source code nor the weights are open source... This is actually worse than Stability AI's previous offering, in that regard.

评论 #35719263 未加载

评论 #35719942 未加载

Taekabout 2 years ago

For anyone who doesn't know, DeepFloyd is a StableDiffusion style image model that more or less replaced CLIP with a full LLM (11b params). The result is that it is much better at responding to more complex prompts.In theory, it is also smarter at learning from its training data.

评论 #35719797 未加载

评论 #35720309 未加载

connerruhlabout 2 years ago

The full release will be soon!<a href="https://twitter.com/EMostaque/status/1651328161148174337" rel="nofollow">https://twitter.com/EMostaque/status/1651328161148174337</a>

simonwabout 2 years ago

It looks like the model on Hugging Face either hasn't been published yet or was withdrawn. I got this error in their Colab notebook:OSError: DeepFloyd/IF-I-IF-v1.0 is not a local folder and is not a valid model identifier listed on '<a href="https://huggingface.co/models" rel="nofollow">https://huggingface.co/models</a>' If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

评论 #35718680 未加载

srajabiabout 2 years ago

Wow this does so well on text! The original model struggled a lot, it's impressive to see how far they've come.

评论 #35744359 未加载

评论 #35744119 未加载

评论 #35744124 未加载

mkaicabout 2 years ago

I think this model will result in a massive new wave of meme culture. AI's already seen success in memes up to this point, but the ability for readable text to be incorporated into images totally changes the game. Going to be an interesting next few months on the interwebz, that's for sure. Exciting times!

评论 #35745314 未加载

Thoreandanabout 2 years ago

"Hi! I'm B-19-7, but to everyperson I'm called Floyd." -Planetfall (1983)My first thought on seeing "Floyd" and "IF" together. It looks like a Pink Floyd reference from the About page on <a href="https://deepfloyd.ai/" rel="nofollow">https://deepfloyd.ai/</a> though.

itslennysfaultabout 2 years ago

This could be super cool for logos. I've tried using Stable Diffusion to generate logos and it does pretty good at helping brainstorm, but the text is always gibberish so you can use its idea, but you have to add your own text which basically means creating a logo from scratch using its designs as inspiration.

评论 #35744532 未加载

评论 #35744492 未加载

评论 #35744887 未加载

kingcharlesabout 2 years ago

The examples on the README are extremely compelling; the state of the art has been raised yet again.

alex_sfabout 2 years ago

The current license makes this largely unusable for nearly any purpose. Really disappointing release from SAI.

评论 #35744906 未加载

评论 #35744433 未加载

atleastoptimalabout 2 years ago

> Text> Handsgood god it solves the two biggest meme issues with image models in one go. Will this be the new state of the art every other model is compared to?

评论 #35721814 未加载

评论 #35720591 未加载

评论 #35720588 未加载

bicepjaiabout 2 years ago

I understand, I have a decade old 2 nvidia 1080 to card, can we infer and train IF on them ?

zimpenfishabout 2 years ago

16GB VRAM minimum is a bit steep. Sadly excludes my 3080 which is annoying because I'd like something better than Stable Diffusion locally.

评论 #35718473 未加载

评论 #35718388 未加载

评论 #35721164 未加载

评论 #35718622 未加载

marginalia_nuabout 2 years ago

> Gorbachev holding meatball pasta in both hands. 1980s synth futuristic max headroom aesthetic. Neon lights.> Aristotle in ancient greek clothes. Toga. New york, rain, film noir, fog, art deco, neon lights, blade runner sci fiSeems to be holding up recently well with the first promt. Second was only OK.

55555about 2 years ago

So this one can create perfect text in images? If true, that’s insane

评论 #35718515 未加载

GaggiXabout 2 years ago

interesting there are different models: <a href="https://github.com/deep-floyd/IF#-model-zoo-">https://github.com/deep-floyd/IF#-model-zoo-</a>I'm also very happy for the release of the two upscaler, I can use them to upscale to result of my small 64x64 DDIM models (maybe with some finetuning).

danweeabout 2 years ago

I would be more interested in image-to-text models. Does someone know of any decent model? I saw the GPT4 demo, and they showed that they do image-to-text... but then that was actually a fake (i.e., the model was interpreting the image filename).

评论 #35744546 未加载

评论 #35744658 未加载

评论 #35744583 未加载

评论 #35744550 未加载

评论 #35744722 未加载

评论 #35744495 未加载

causality0about 2 years ago

Is this intended to replace Stable Diffusion? Somebody want to give the eli5?

评论 #35744089 未加载

评论 #35744098 未加载

评论 #35744015 未加载

评论 #35744019 未加载

dr_kiszonkaabout 2 years ago

Looks like music generation is on their roadmap. Fun!<a href="https://stability.ai/careers?gh_jid=4142190101" rel="nofollow">https://stability.ai/careers?gh_jid=4142190101</a>

jacob019about 2 years ago

Any web based front ends yet? I put together a system that runs a variety of web based open source AI image generation and editing tools on Vultr GPU instances. It spins up instances on demand, mounts an NFS filesystem with local caching and a COW layer, spawns the services, proxies the requests, and then spins down idle instances when I'm done. Would love to add this, suppose I could whip something up if none exists.

评论 #35720551 未加载

评论 #35732938 未加载

epivosismabout 2 years ago

Here are some play markets on manifold markets tracking its release: <a href="https://manifold.markets/markets?s=relevance&f=all&q=deepfloyd" rel="nofollow">https://manifold.markets/markets?s=relevance&f=all&q=deepflo...</a>35% to full release by end of month, although it may not have adjusted.

anirbanc88about 2 years ago

<a href="https://www.kaggle.com/code/anivana/deepfloyd-if-playground/" rel="nofollow">https://www.kaggle.com/code/anivana/deepfloyd-if-playground/</a>I played with some ready prompts here

marvinkennisabout 2 years ago

Seeing a lot of text-to-image out there recently. Does anyone know what the current state of the art is on image-to-text? Thinking something similar to Midjourney's /describe command that they added in v5

评论 #35744228 未加载

评论 #35744713 未加载

epivosismabout 2 years ago

There's a discord with tons of sample images, where we've been waiting patiently for the release, coming SOON, for 3 months now. <a href="https://discord.gg/pxewcvSvNx" rel="nofollow">https://discord.gg/pxewcvSvNx</a>

评论 #35721536 未加载

jlsreleafabout 2 years ago

Website design main page. Bright vibrant neon colors of the rainbow slimes, slime business, kid attention grabbing, splashes of bright neon colors. Professional looking Website page, high quality resolution 8k

bulbosaur123about 2 years ago

What are the official and unofficial discords?I found only this one on their subreddit: <a href="https://discord.gg/GvsvNrVkk5" rel="nofollow">https://discord.gg/GvsvNrVkk5</a>

youssefabdelmabout 2 years ago

Meh, results feel hodge podge like a bunch of models were stitched together

jlsreleafabout 2 years ago

Website design for slime. Professional looking, high-quality, 8k, brightest neon colors of the rainbow slimes, splashes of neon colors in background, kid attention grabbing, eye catching

vitorgrsabout 2 years ago

Tried using right now, and it's way better than Stable Diffusion (be it 1.5, 2.1 or SDXL).But is harder to get a good picture. This fine tuned with a good RLHF will be amazing.

评论 #35745033 未加载

etaioinshrdluabout 2 years ago

Does paying Hugggingface to run it on the GPU count as commercial use?

TheBlapseabout 2 years ago

Currently down on hugging face

TheBlapseabout 2 years ago

"Imagen free"

37 comments

dangabout 2 years ago

minimaxirabout 2 years ago

评论 #35744427 未加载

评论 #35744151 未加载

评论 #35744407 未加载

评论 #35744375 未加载

epivosismabout 2 years ago

评论 #35721830 未加载

评论 #35725351 未加载

评论 #35723282 未加载

评论 #35721182 未加载

评论 #35721110 未加载

lalaithionabout 2 years ago

评论 #35721071 未加载

评论 #35720987 未加载

hunkinsabout 2 years ago

评论 #35719907 未加载

评论 #35719382 未加载

评论 #35718961 未加载

评论 #35719228 未加载

评论 #35718952 未加载

orraabout 2 years ago

Neither the source code nor the weights are open source... This is actually worse than Stability AI's previous offering, in that regard.

评论 #35719263 未加载

评论 #35719942 未加载

Taekabout 2 years ago

评论 #35719797 未加载

评论 #35720309 未加载

connerruhlabout 2 years ago

The full release will be soon!<a href="https://twitter.com/EMostaque/status/1651328161148174337" rel="nofollow">https://twitter.com/EMostaque/status/1651328161148174337</a>

simonwabout 2 years ago

评论 #35718680 未加载

srajabiabout 2 years ago

Wow this does so well on text! The original model struggled a lot, it's impressive to see how far they've come.

评论 #35744359 未加载

评论 #35744119 未加载

评论 #35744124 未加载

mkaicabout 2 years ago

评论 #35745314 未加载

Thoreandanabout 2 years ago

itslennysfaultabout 2 years ago

评论 #35744532 未加载

评论 #35744492 未加载

评论 #35744887 未加载

kingcharlesabout 2 years ago

The examples on the README are extremely compelling; the state of the art has been raised yet again.

alex_sfabout 2 years ago

The current license makes this largely unusable for nearly any purpose. Really disappointing release from SAI.

评论 #35744906 未加载

评论 #35744433 未加载

atleastoptimalabout 2 years ago

> Text> Handsgood god it solves the two biggest meme issues with image models in one go. Will this be the new state of the art every other model is compared to?

评论 #35721814 未加载

评论 #35720591 未加载

评论 #35720588 未加载

bicepjaiabout 2 years ago

I understand, I have a decade old 2 nvidia 1080 to card, can we infer and train IF on them ?

zimpenfishabout 2 years ago

16GB VRAM minimum is a bit steep. Sadly excludes my 3080 which is annoying because I'd like something better than Stable Diffusion locally.

评论 #35718473 未加载

评论 #35718388 未加载

评论 #35721164 未加载

评论 #35718622 未加载

marginalia_nuabout 2 years ago

55555about 2 years ago

So this one can create perfect text in images? If true, that’s insane

评论 #35718515 未加载

GaggiXabout 2 years ago

danweeabout 2 years ago

评论 #35744546 未加载

评论 #35744658 未加载

评论 #35744583 未加载

评论 #35744550 未加载

评论 #35744722 未加载

评论 #35744495 未加载

causality0about 2 years ago

Is this intended to replace Stable Diffusion? Somebody want to give the eli5?

评论 #35744089 未加载

评论 #35744098 未加载

评论 #35744015 未加载

评论 #35744019 未加载

dr_kiszonkaabout 2 years ago

Looks like music generation is on their roadmap. Fun!<a href="https://stability.ai/careers?gh_jid=4142190101" rel="nofollow">https://stability.ai/careers?gh_jid=4142190101</a>

jacob019about 2 years ago

评论 #35720551 未加载

评论 #35732938 未加载

epivosismabout 2 years ago

anirbanc88about 2 years ago

<a href="https://www.kaggle.com/code/anivana/deepfloyd-if-playground/" rel="nofollow">https://www.kaggle.com/code/anivana/deepfloyd-if-playground/</a>I played with some ready prompts here

marvinkennisabout 2 years ago

评论 #35744228 未加载

评论 #35744713 未加载

epivosismabout 2 years ago

评论 #35721536 未加载

jlsreleafabout 2 years ago

bulbosaur123about 2 years ago

What are the official and unofficial discords?I found only this one on their subreddit: <a href="https://discord.gg/GvsvNrVkk5" rel="nofollow">https://discord.gg/GvsvNrVkk5</a>

youssefabdelmabout 2 years ago

Meh, results feel hodge podge like a bunch of models were stitched together

jlsreleafabout 2 years ago

Website design for slime. Professional looking, high-quality, 8k, brightest neon colors of the rainbow slimes, splashes of neon colors in background, kid attention grabbing, eye catching

vitorgrsabout 2 years ago

Tried using right now, and it's way better than Stable Diffusion (be it 1.5, 2.1 or SDXL).But is harder to get a good picture. This fine tuned with a good RLHF will be amazing.

评论 #35745033 未加载

etaioinshrdluabout 2 years ago

Does paying Hugggingface to run it on the GPU count as commercial use?

TheBlapseabout 2 years ago

Currently down on hugging face

TheBlapseabout 2 years ago

"Imagen free"