DeepFloyd IF: open-source text-to-image model

259 点作者 ea016大约 2 年前

37 条评论

dang大约 2 年前

Related: <a href="https://stability.ai/blog/deepfloyd-if-text-to-image-model" rel="nofollow">https://stability.ai/blog/deepfloyd-if-text-to-image-model</a>(via <a href="https://news.ycombinator.com/item?id=35743727" rel="nofollow">https://news.ycombinator.com/item?id=35743727</a>, but we've merged that thread into this earlier one)

minimaxir大约 2 年前

GitHub: <a href="https://github.com/deep-floyd/IF">https://github.com/deep-floyd/IF</a>Colab Notebook for running the model based on the diffusers library: <a href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/deepfloyd_if_free_tier_google_colab.ipynb" rel="nofollow">https://colab.research.google.com/github/huggingface/noteboo...</a>Hugging Face Space for testing the model: <a href="https://huggingface.co/spaces/DeepFloyd/IF" rel="nofollow">https://huggingface.co/spaces/DeepFloyd/IF</a>Note that the model is substantially more compute-intensive than Stable Diffusion, so it may be slower even though that space is running on an A100.

评论 #35744427 未加载

评论 #35744151 未加载

评论 #35744407 未加载

评论 #35744375 未加载

epivosism大约 2 年前

Example of how much better it can do compared to midjourney, on a complex prompt: <a href="https://twitter.com/eb_french/status/1623823175170805760" rel="nofollow">https://twitter.com/eb_french/status/1623823175170805760</a>It is able to put people on the left/right and put the correct t-shirts and facial expressions on each one. This is compared to mj which just mixes together a soup of every word you use and plops it out into the image. Huge MJ fan of course, it's amazing, but having compositional power is another step up.

评论 #35721830 未加载

评论 #35725351 未加载

评论 #35723282 未加载

评论 #35721182 未加载

评论 #35721110 未加载

lalaithion大约 2 年前

Has anyone tried the Scott Alexander AI bet prompts?1. A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth2. An oil painting of a man in a factory looking at a cat wearing a top hat3. A digital art picture of a child riding a llama with a bell on its tail through a desert4. A 3D render of an astronaut in space holding a fox wearing lipstick5. Pixel art of a farmer in a cathedral holding a red basketball

评论 #35721071 未加载

评论 #35720987 未加载

hunkins大约 2 年前

New restriction in their License suggests the software can't be modified."2. All persons obtaining a copy or substantial portion of the Software, a modified version of the Software (or substantial portion thereof), or a derivative work based upon this Software (or substantial portion thereof) must not delete, remove, disable, diminish, or circumvent any inference filters or inference filter mechanisms in the Software, or any portion of the Software that implements any such filters or filter mechanisms."

评论 #35719907 未加载

评论 #35719382 未加载

评论 #35718961 未加载

评论 #35719228 未加载

评论 #35718952 未加载

orra大约 2 年前

Neither the source code nor the weights are open source... This is actually worse than Stability AI's previous offering, in that regard.

评论 #35719263 未加载

评论 #35719942 未加载

Taek大约 2 年前

For anyone who doesn't know, DeepFloyd is a StableDiffusion style image model that more or less replaced CLIP with a full LLM (11b params). The result is that it is much better at responding to more complex prompts.In theory, it is also smarter at learning from its training data.

评论 #35719797 未加载

评论 #35720309 未加载

connerruhl大约 2 年前

The full release will be soon!<a href="https://twitter.com/EMostaque/status/1651328161148174337" rel="nofollow">https://twitter.com/EMostaque/status/1651328161148174337</a>

simonw大约 2 年前

It looks like the model on Hugging Face either hasn't been published yet or was withdrawn. I got this error in their Colab notebook:OSError: DeepFloyd/IF-I-IF-v1.0 is not a local folder and is not a valid model identifier listed on '<a href="https://huggingface.co/models" rel="nofollow">https://huggingface.co/models</a>' If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

评论 #35718680 未加载

srajabi大约 2 年前

Wow this does so well on text! The original model struggled a lot, it's impressive to see how far they've come.

评论 #35744359 未加载

评论 #35744119 未加载

评论 #35744124 未加载

mkaic大约 2 年前

I think this model will result in a massive new wave of meme culture. AI's already seen success in memes up to this point, but the ability for readable text to be incorporated into images totally changes the game. Going to be an interesting next few months on the interwebz, that's for sure. Exciting times!

评论 #35745314 未加载

Thoreandan大约 2 年前

"Hi! I'm B-19-7, but to everyperson I'm called Floyd." -Planetfall (1983)My first thought on seeing "Floyd" and "IF" together. It looks like a Pink Floyd reference from the About page on <a href="https://deepfloyd.ai/" rel="nofollow">https://deepfloyd.ai/</a> though.

itslennysfault大约 2 年前

This could be super cool for logos. I've tried using Stable Diffusion to generate logos and it does pretty good at helping brainstorm, but the text is always gibberish so you can use its idea, but you have to add your own text which basically means creating a logo from scratch using its designs as inspiration.

评论 #35744532 未加载

评论 #35744492 未加载

评论 #35744887 未加载

kingcharles大约 2 年前

The examples on the README are extremely compelling; the state of the art has been raised yet again.

alex_sf大约 2 年前

The current license makes this largely unusable for nearly any purpose. Really disappointing release from SAI.

评论 #35744906 未加载

评论 #35744433 未加载

atleastoptimal大约 2 年前

> Text> Handsgood god it solves the two biggest meme issues with image models in one go. Will this be the new state of the art every other model is compared to?

评论 #35721814 未加载

评论 #35720591 未加载

评论 #35720588 未加载

bicepjai大约 2 年前

I understand, I have a decade old 2 nvidia 1080 to card, can we infer and train IF on them ?

zimpenfish大约 2 年前

16GB VRAM minimum is a bit steep. Sadly excludes my 3080 which is annoying because I'd like something better than Stable Diffusion locally.

评论 #35718473 未加载

评论 #35718388 未加载

评论 #35721164 未加载

评论 #35718622 未加载

marginalia_nu大约 2 年前

> Gorbachev holding meatball pasta in both hands. 1980s synth futuristic max headroom aesthetic. Neon lights.> Aristotle in ancient greek clothes. Toga. New york, rain, film noir, fog, art deco, neon lights, blade runner sci fiSeems to be holding up recently well with the first promt. Second was only OK.

55555大约 2 年前

So this one can create perfect text in images? If true, that’s insane

评论 #35718515 未加载

GaggiX大约 2 年前

interesting there are different models: <a href="https://github.com/deep-floyd/IF#-model-zoo-">https://github.com/deep-floyd/IF#-model-zoo-</a>I'm also very happy for the release of the two upscaler, I can use them to upscale to result of my small 64x64 DDIM models (maybe with some finetuning).

danwee大约 2 年前

I would be more interested in image-to-text models. Does someone know of any decent model? I saw the GPT4 demo, and they showed that they do image-to-text... but then that was actually a fake (i.e., the model was interpreting the image filename).

评论 #35744546 未加载

评论 #35744658 未加载

评论 #35744583 未加载

评论 #35744550 未加载

评论 #35744722 未加载

评论 #35744495 未加载

causality0大约 2 年前

Is this intended to replace Stable Diffusion? Somebody want to give the eli5?

评论 #35744089 未加载

评论 #35744098 未加载

评论 #35744015 未加载

评论 #35744019 未加载

dr_kiszonka大约 2 年前

Looks like music generation is on their roadmap. Fun!<a href="https://stability.ai/careers?gh_jid=4142190101" rel="nofollow">https://stability.ai/careers?gh_jid=4142190101</a>

jacob019大约 2 年前

Any web based front ends yet? I put together a system that runs a variety of web based open source AI image generation and editing tools on Vultr GPU instances. It spins up instances on demand, mounts an NFS filesystem with local caching and a COW layer, spawns the services, proxies the requests, and then spins down idle instances when I'm done. Would love to add this, suppose I could whip something up if none exists.

评论 #35720551 未加载

评论 #35732938 未加载

epivosism大约 2 年前

Here are some play markets on manifold markets tracking its release: <a href="https://manifold.markets/markets?s=relevance&f=all&q=deepfloyd" rel="nofollow">https://manifold.markets/markets?s=relevance&f=all&q=deepflo...</a>35% to full release by end of month, although it may not have adjusted.

anirbanc88大约 2 年前

<a href="https://www.kaggle.com/code/anivana/deepfloyd-if-playground/" rel="nofollow">https://www.kaggle.com/code/anivana/deepfloyd-if-playground/</a>I played with some ready prompts here

marvinkennis大约 2 年前

Seeing a lot of text-to-image out there recently. Does anyone know what the current state of the art is on image-to-text? Thinking something similar to Midjourney's /describe command that they added in v5

评论 #35744228 未加载

评论 #35744713 未加载

epivosism大约 2 年前

There's a discord with tons of sample images, where we've been waiting patiently for the release, coming SOON, for 3 months now. <a href="https://discord.gg/pxewcvSvNx" rel="nofollow">https://discord.gg/pxewcvSvNx</a>

评论 #35721536 未加载

jlsreleaf大约 2 年前

Website design main page. Bright vibrant neon colors of the rainbow slimes, slime business, kid attention grabbing, splashes of bright neon colors. Professional looking Website page, high quality resolution 8k

bulbosaur123大约 2 年前

What are the official and unofficial discords?I found only this one on their subreddit: <a href="https://discord.gg/GvsvNrVkk5" rel="nofollow">https://discord.gg/GvsvNrVkk5</a>

youssefabdelm大约 2 年前

Meh, results feel hodge podge like a bunch of models were stitched together

jlsreleaf大约 2 年前

Website design for slime. Professional looking, high-quality, 8k, brightest neon colors of the rainbow slimes, splashes of neon colors in background, kid attention grabbing, eye catching

vitorgrs大约 2 年前

Tried using right now, and it's way better than Stable Diffusion (be it 1.5, 2.1 or SDXL).But is harder to get a good picture. This fine tuned with a good RLHF will be amazing.

评论 #35745033 未加载

etaioinshrdlu大约 2 年前

Does paying Hugggingface to run it on the GPU count as commercial use?

TheBlapse大约 2 年前

Currently down on hugging face

TheBlapse大约 2 年前

"Imagen free"

37 条评论

dang大约 2 年前

minimaxir大约 2 年前

评论 #35744427 未加载

评论 #35744151 未加载

评论 #35744407 未加载

评论 #35744375 未加载

epivosism大约 2 年前

评论 #35721830 未加载

评论 #35725351 未加载

评论 #35723282 未加载

评论 #35721182 未加载

评论 #35721110 未加载

lalaithion大约 2 年前

评论 #35721071 未加载

评论 #35720987 未加载

hunkins大约 2 年前

评论 #35719907 未加载

评论 #35719382 未加载

评论 #35718961 未加载

评论 #35719228 未加载

评论 #35718952 未加载

orra大约 2 年前

Neither the source code nor the weights are open source... This is actually worse than Stability AI's previous offering, in that regard.

评论 #35719263 未加载

评论 #35719942 未加载

Taek大约 2 年前

评论 #35719797 未加载

评论 #35720309 未加载

connerruhl大约 2 年前

The full release will be soon!<a href="https://twitter.com/EMostaque/status/1651328161148174337" rel="nofollow">https://twitter.com/EMostaque/status/1651328161148174337</a>

simonw大约 2 年前

评论 #35718680 未加载

srajabi大约 2 年前

Wow this does so well on text! The original model struggled a lot, it's impressive to see how far they've come.

评论 #35744359 未加载

评论 #35744119 未加载

评论 #35744124 未加载

mkaic大约 2 年前

评论 #35745314 未加载

Thoreandan大约 2 年前

itslennysfault大约 2 年前

评论 #35744532 未加载

评论 #35744492 未加载

评论 #35744887 未加载

kingcharles大约 2 年前

The examples on the README are extremely compelling; the state of the art has been raised yet again.

alex_sf大约 2 年前

The current license makes this largely unusable for nearly any purpose. Really disappointing release from SAI.

评论 #35744906 未加载

评论 #35744433 未加载

atleastoptimal大约 2 年前

> Text> Handsgood god it solves the two biggest meme issues with image models in one go. Will this be the new state of the art every other model is compared to?

评论 #35721814 未加载

评论 #35720591 未加载

评论 #35720588 未加载

bicepjai大约 2 年前

I understand, I have a decade old 2 nvidia 1080 to card, can we infer and train IF on them ?

zimpenfish大约 2 年前

16GB VRAM minimum is a bit steep. Sadly excludes my 3080 which is annoying because I'd like something better than Stable Diffusion locally.

评论 #35718473 未加载

评论 #35718388 未加载

评论 #35721164 未加载

评论 #35718622 未加载

marginalia_nu大约 2 年前

55555大约 2 年前

So this one can create perfect text in images? If true, that’s insane

评论 #35718515 未加载

GaggiX大约 2 年前

danwee大约 2 年前

评论 #35744546 未加载

评论 #35744658 未加载

评论 #35744583 未加载

评论 #35744550 未加载

评论 #35744722 未加载

评论 #35744495 未加载

causality0大约 2 年前

Is this intended to replace Stable Diffusion? Somebody want to give the eli5?

评论 #35744089 未加载

评论 #35744098 未加载

评论 #35744015 未加载

评论 #35744019 未加载

dr_kiszonka大约 2 年前

Looks like music generation is on their roadmap. Fun!<a href="https://stability.ai/careers?gh_jid=4142190101" rel="nofollow">https://stability.ai/careers?gh_jid=4142190101</a>

jacob019大约 2 年前

评论 #35720551 未加载

评论 #35732938 未加载

epivosism大约 2 年前

anirbanc88大约 2 年前

<a href="https://www.kaggle.com/code/anivana/deepfloyd-if-playground/" rel="nofollow">https://www.kaggle.com/code/anivana/deepfloyd-if-playground/</a>I played with some ready prompts here

marvinkennis大约 2 年前

评论 #35744228 未加载

评论 #35744713 未加载

epivosism大约 2 年前

评论 #35721536 未加载

jlsreleaf大约 2 年前

bulbosaur123大约 2 年前

What are the official and unofficial discords?I found only this one on their subreddit: <a href="https://discord.gg/GvsvNrVkk5" rel="nofollow">https://discord.gg/GvsvNrVkk5</a>

youssefabdelm大约 2 年前

Meh, results feel hodge podge like a bunch of models were stitched together

jlsreleaf大约 2 年前

Website design for slime. Professional looking, high-quality, 8k, brightest neon colors of the rainbow slimes, splashes of neon colors in background, kid attention grabbing, eye catching

vitorgrs大约 2 年前

Tried using right now, and it's way better than Stable Diffusion (be it 1.5, 2.1 or SDXL).But is harder to get a good picture. This fine tuned with a good RLHF will be amazing.

评论 #35745033 未加载

etaioinshrdlu大约 2 年前

Does paying Hugggingface to run it on the GPU count as commercial use?

TheBlapse大约 2 年前

Currently down on hugging face

TheBlapse大约 2 年前

"Imagen free"