Related: <a href="https://stability.ai/blog/deepfloyd-if-text-to-image-model" rel="nofollow">https://stability.ai/blog/deepfloyd-if-text-to-image-model</a><p>(via <a href="https://news.ycombinator.com/item?id=35743727" rel="nofollow">https://news.ycombinator.com/item?id=35743727</a>, but we've merged that thread into this earlier one)
GitHub: <a href="https://github.com/deep-floyd/IF">https://github.com/deep-floyd/IF</a><p>Colab Notebook for running the model based on the diffusers library: <a href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/deepfloyd_if_free_tier_google_colab.ipynb" rel="nofollow">https://colab.research.google.com/github/huggingface/noteboo...</a><p>Hugging Face Space for testing the model: <a href="https://huggingface.co/spaces/DeepFloyd/IF" rel="nofollow">https://huggingface.co/spaces/DeepFloyd/IF</a><p>Note that the model is substantially more compute-intensive than Stable Diffusion, so it may be slower even though that space is running on an A100.
Example of how much better it can do compared to midjourney, on a complex prompt: <a href="https://twitter.com/eb_french/status/1623823175170805760" rel="nofollow">https://twitter.com/eb_french/status/1623823175170805760</a><p>It is able to put people on the left/right and put the correct t-shirts and facial expressions on each one. This is compared to mj which just mixes together a soup of every word you use and plops it out into the image. Huge MJ fan of course, it's amazing, but having compositional power is another step up.
Has anyone tried the Scott Alexander AI bet prompts?<p>1. A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth<p>2. An oil painting of a man in a factory looking at a cat wearing a top hat<p>3. A digital art picture of a child riding a llama with a bell on its tail through a desert<p>4. A 3D render of an astronaut in space holding a fox wearing lipstick<p>5. Pixel art of a farmer in a cathedral holding a red basketball
New restriction in their License suggests the software can't be modified.<p>"2. All persons obtaining a copy or substantial portion of the Software,
a modified version of the Software (or substantial portion thereof), or
a derivative work based upon this Software (or substantial portion thereof)
must not delete, remove, disable, diminish, or circumvent any inference filters or
inference filter mechanisms in the Software, or any portion of the Software that
implements any such filters or filter mechanisms."
For anyone who doesn't know, DeepFloyd is a StableDiffusion style image model that more or less replaced CLIP with a full LLM (11b params). The result is that it is much better at responding to more complex prompts.<p>In theory, it is also smarter at learning from its training data.
The full release will be soon!<p><a href="https://twitter.com/EMostaque/status/1651328161148174337" rel="nofollow">https://twitter.com/EMostaque/status/1651328161148174337</a>
It looks like the model on Hugging Face either hasn't been published yet or was withdrawn. I got this error in their Colab notebook:<p>OSError: DeepFloyd/IF-I-IF-v1.0 is not a local folder and is not a valid model identifier listed on
'<a href="https://huggingface.co/models" rel="nofollow">https://huggingface.co/models</a>'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or
log in with `huggingface-cli login` and pass `use_auth_token=True`.
I think this model will result in a massive new wave of meme culture. AI's already seen success in memes up to this point, but the ability for readable text to be incorporated into images totally changes the game. Going to be an interesting next few months on the interwebz, that's for sure. Exciting times!
"Hi! I'm B-19-7, but to everyperson I'm called Floyd."
-Planetfall (1983)<p>My first thought on seeing "Floyd" and "IF" together. It looks like a Pink Floyd reference from the About page on <a href="https://deepfloyd.ai/" rel="nofollow">https://deepfloyd.ai/</a> though.
This could be super cool for logos. I've tried using Stable Diffusion to generate logos and it does pretty good at helping brainstorm, but the text is always gibberish so you can use its idea, but you have to add your own text which basically means creating a logo from scratch using its designs as inspiration.
> Text<p>> Hands<p>good god it solves the two biggest meme issues with image models in one go. Will this be the new state of the art every other model is compared to?
> Gorbachev holding meatball pasta in both hands. 1980s synth futuristic max headroom aesthetic. Neon lights.<p>> Aristotle in ancient greek clothes. Toga. New york, rain, film noir, fog, art deco, neon lights, blade runner sci fi<p>Seems to be holding up recently well with the first promt. Second was only OK.
interesting there are different models: <a href="https://github.com/deep-floyd/IF#-model-zoo-">https://github.com/deep-floyd/IF#-model-zoo-</a><p>I'm also very happy for the release of the two upscaler, I can use them to upscale to result of my small 64x64 DDIM models (maybe with some finetuning).
I would be more interested in image-to-text models. Does someone know of any decent model? I saw the GPT4 demo, and they showed that they do image-to-text... but then that was actually a fake (i.e., the model was interpreting the image filename).
Looks like music generation is on their roadmap. Fun!<p><a href="https://stability.ai/careers?gh_jid=4142190101" rel="nofollow">https://stability.ai/careers?gh_jid=4142190101</a>
Any web based front ends yet? I put together a system that runs a variety of web based open source AI image generation and editing tools on Vultr GPU instances. It spins up instances on demand, mounts an NFS filesystem with local caching and a COW layer, spawns the services, proxies the requests, and then spins down idle instances when I'm done. Would love to add this, suppose I could whip something up if none exists.
Here are some play markets on manifold markets tracking its release: <a href="https://manifold.markets/markets?s=relevance&f=all&q=deepfloyd" rel="nofollow">https://manifold.markets/markets?s=relevance&f=all&q=deepflo...</a><p>35% to full release by end of month, although it may not have adjusted.
<a href="https://www.kaggle.com/code/anivana/deepfloyd-if-playground/" rel="nofollow">https://www.kaggle.com/code/anivana/deepfloyd-if-playground/</a><p>I played with some ready prompts here
Seeing a lot of text-to-image out there recently. Does anyone know what the current state of the art is on image-to-text? Thinking something similar to Midjourney's /describe command that they added in v5
There's a discord with tons of sample images, where we've been waiting patiently for the release, coming SOON, for 3 months now. <a href="https://discord.gg/pxewcvSvNx" rel="nofollow">https://discord.gg/pxewcvSvNx</a>
Website design main page. Bright vibrant neon colors of the rainbow slimes, slime business, kid attention grabbing, splashes of bright neon colors. Professional looking Website page, high quality resolution 8k
What are the official and unofficial discords?<p>I found only this one on their subreddit: <a href="https://discord.gg/GvsvNrVkk5" rel="nofollow">https://discord.gg/GvsvNrVkk5</a>
Website design for slime. Professional looking, high-quality, 8k, brightest neon colors of the rainbow slimes, splashes of neon colors in background, kid attention grabbing, eye catching
Tried using right now, and it's way better than Stable Diffusion (be it 1.5, 2.1 or SDXL).<p>But is harder to get a good picture. This fine tuned with a good RLHF will be amazing.