TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

DeepFloyd IF: open-source text-to-image model

259 pointsby ea016about 2 years ago

37 comments

dangabout 2 years ago
Related: <a href="https:&#x2F;&#x2F;stability.ai&#x2F;blog&#x2F;deepfloyd-if-text-to-image-model" rel="nofollow">https:&#x2F;&#x2F;stability.ai&#x2F;blog&#x2F;deepfloyd-if-text-to-image-model</a><p>(via <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=35743727" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=35743727</a>, but we&#x27;ve merged that thread into this earlier one)
minimaxirabout 2 years ago
GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;deep-floyd&#x2F;IF">https:&#x2F;&#x2F;github.com&#x2F;deep-floyd&#x2F;IF</a><p>Colab Notebook for running the model based on the diffusers library: <a href="https:&#x2F;&#x2F;colab.research.google.com&#x2F;github&#x2F;huggingface&#x2F;notebooks&#x2F;blob&#x2F;main&#x2F;diffusers&#x2F;deepfloyd_if_free_tier_google_colab.ipynb" rel="nofollow">https:&#x2F;&#x2F;colab.research.google.com&#x2F;github&#x2F;huggingface&#x2F;noteboo...</a><p>Hugging Face Space for testing the model: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;DeepFloyd&#x2F;IF" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;DeepFloyd&#x2F;IF</a><p>Note that the model is substantially more compute-intensive than Stable Diffusion, so it may be slower even though that space is running on an A100.
评论 #35744427 未加载
评论 #35744151 未加载
评论 #35744407 未加载
评论 #35744375 未加载
epivosismabout 2 years ago
Example of how much better it can do compared to midjourney, on a complex prompt: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;eb_french&#x2F;status&#x2F;1623823175170805760" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;eb_french&#x2F;status&#x2F;1623823175170805760</a><p>It is able to put people on the left&#x2F;right and put the correct t-shirts and facial expressions on each one. This is compared to mj which just mixes together a soup of every word you use and plops it out into the image. Huge MJ fan of course, it&#x27;s amazing, but having compositional power is another step up.
评论 #35721830 未加载
评论 #35725351 未加载
评论 #35723282 未加载
评论 #35721182 未加载
评论 #35721110 未加载
lalaithionabout 2 years ago
Has anyone tried the Scott Alexander AI bet prompts?<p>1. A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth<p>2. An oil painting of a man in a factory looking at a cat wearing a top hat<p>3. A digital art picture of a child riding a llama with a bell on its tail through a desert<p>4. A 3D render of an astronaut in space holding a fox wearing lipstick<p>5. Pixel art of a farmer in a cathedral holding a red basketball
评论 #35721071 未加载
评论 #35720987 未加载
hunkinsabout 2 years ago
New restriction in their License suggests the software can&#x27;t be modified.<p>&quot;2. All persons obtaining a copy or substantial portion of the Software, a modified version of the Software (or substantial portion thereof), or a derivative work based upon this Software (or substantial portion thereof) must not delete, remove, disable, diminish, or circumvent any inference filters or inference filter mechanisms in the Software, or any portion of the Software that implements any such filters or filter mechanisms.&quot;
评论 #35719907 未加载
评论 #35719382 未加载
评论 #35718961 未加载
评论 #35719228 未加载
评论 #35718952 未加载
orraabout 2 years ago
Neither the source code nor the weights are open source... This is actually worse than Stability AI&#x27;s previous offering, in that regard.
评论 #35719263 未加载
评论 #35719942 未加载
Taekabout 2 years ago
For anyone who doesn&#x27;t know, DeepFloyd is a StableDiffusion style image model that more or less replaced CLIP with a full LLM (11b params). The result is that it is much better at responding to more complex prompts.<p>In theory, it is also smarter at learning from its training data.
评论 #35719797 未加载
评论 #35720309 未加载
connerruhlabout 2 years ago
The full release will be soon!<p><a href="https:&#x2F;&#x2F;twitter.com&#x2F;EMostaque&#x2F;status&#x2F;1651328161148174337" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;EMostaque&#x2F;status&#x2F;1651328161148174337</a>
simonwabout 2 years ago
It looks like the model on Hugging Face either hasn&#x27;t been published yet or was withdrawn. I got this error in their Colab notebook:<p>OSError: DeepFloyd&#x2F;IF-I-IF-v1.0 is not a local folder and is not a valid model identifier listed on &#x27;<a href="https:&#x2F;&#x2F;huggingface.co&#x2F;models" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;models</a>&#x27; If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
评论 #35718680 未加载
srajabiabout 2 years ago
Wow this does so well on text! The original model struggled a lot, it&#x27;s impressive to see how far they&#x27;ve come.
评论 #35744359 未加载
评论 #35744119 未加载
评论 #35744124 未加载
mkaicabout 2 years ago
I think this model will result in a massive new wave of meme culture. AI&#x27;s already seen success in memes up to this point, but the ability for readable text to be incorporated into images totally changes the game. Going to be an interesting next few months on the interwebz, that&#x27;s for sure. Exciting times!
评论 #35745314 未加载
Thoreandanabout 2 years ago
&quot;Hi! I&#x27;m B-19-7, but to everyperson I&#x27;m called Floyd.&quot; -Planetfall (1983)<p>My first thought on seeing &quot;Floyd&quot; and &quot;IF&quot; together. It looks like a Pink Floyd reference from the About page on <a href="https:&#x2F;&#x2F;deepfloyd.ai&#x2F;" rel="nofollow">https:&#x2F;&#x2F;deepfloyd.ai&#x2F;</a> though.
itslennysfaultabout 2 years ago
This could be super cool for logos. I&#x27;ve tried using Stable Diffusion to generate logos and it does pretty good at helping brainstorm, but the text is always gibberish so you can use its idea, but you have to add your own text which basically means creating a logo from scratch using its designs as inspiration.
评论 #35744532 未加载
评论 #35744492 未加载
评论 #35744887 未加载
kingcharlesabout 2 years ago
The examples on the README are extremely compelling; the state of the art has been raised yet again.
alex_sfabout 2 years ago
The current license makes this largely unusable for nearly any purpose. Really disappointing release from SAI.
评论 #35744906 未加载
评论 #35744433 未加载
atleastoptimalabout 2 years ago
&gt; Text<p>&gt; Hands<p>good god it solves the two biggest meme issues with image models in one go. Will this be the new state of the art every other model is compared to?
评论 #35721814 未加载
评论 #35720591 未加载
评论 #35720588 未加载
bicepjaiabout 2 years ago
I understand, I have a decade old 2 nvidia 1080 to card, can we infer and train IF on them ?
zimpenfishabout 2 years ago
16GB VRAM minimum is a bit steep. Sadly excludes my 3080 which is annoying because I&#x27;d like something better than Stable Diffusion locally.
评论 #35718473 未加载
评论 #35718388 未加载
评论 #35721164 未加载
评论 #35718622 未加载
marginalia_nuabout 2 years ago
&gt; Gorbachev holding meatball pasta in both hands. 1980s synth futuristic max headroom aesthetic. Neon lights.<p>&gt; Aristotle in ancient greek clothes. Toga. New york, rain, film noir, fog, art deco, neon lights, blade runner sci fi<p>Seems to be holding up recently well with the first promt. Second was only OK.
55555about 2 years ago
So this one can create perfect text in images? If true, that’s insane
评论 #35718515 未加载
GaggiXabout 2 years ago
interesting there are different models: <a href="https:&#x2F;&#x2F;github.com&#x2F;deep-floyd&#x2F;IF#-model-zoo-">https:&#x2F;&#x2F;github.com&#x2F;deep-floyd&#x2F;IF#-model-zoo-</a><p>I&#x27;m also very happy for the release of the two upscaler, I can use them to upscale to result of my small 64x64 DDIM models (maybe with some finetuning).
danweeabout 2 years ago
I would be more interested in image-to-text models. Does someone know of any decent model? I saw the GPT4 demo, and they showed that they do image-to-text... but then that was actually a fake (i.e., the model was interpreting the image filename).
评论 #35744546 未加载
评论 #35744658 未加载
评论 #35744583 未加载
评论 #35744550 未加载
评论 #35744722 未加载
评论 #35744495 未加载
causality0about 2 years ago
Is this intended to replace Stable Diffusion? Somebody want to give the eli5?
评论 #35744089 未加载
评论 #35744098 未加载
评论 #35744015 未加载
评论 #35744019 未加载
dr_kiszonkaabout 2 years ago
Looks like music generation is on their roadmap. Fun!<p><a href="https:&#x2F;&#x2F;stability.ai&#x2F;careers?gh_jid=4142190101" rel="nofollow">https:&#x2F;&#x2F;stability.ai&#x2F;careers?gh_jid=4142190101</a>
jacob019about 2 years ago
Any web based front ends yet? I put together a system that runs a variety of web based open source AI image generation and editing tools on Vultr GPU instances. It spins up instances on demand, mounts an NFS filesystem with local caching and a COW layer, spawns the services, proxies the requests, and then spins down idle instances when I&#x27;m done. Would love to add this, suppose I could whip something up if none exists.
评论 #35720551 未加载
评论 #35732938 未加载
epivosismabout 2 years ago
Here are some play markets on manifold markets tracking its release: <a href="https:&#x2F;&#x2F;manifold.markets&#x2F;markets?s=relevance&amp;f=all&amp;q=deepfloyd" rel="nofollow">https:&#x2F;&#x2F;manifold.markets&#x2F;markets?s=relevance&amp;f=all&amp;q=deepflo...</a><p>35% to full release by end of month, although it may not have adjusted.
anirbanc88about 2 years ago
<a href="https:&#x2F;&#x2F;www.kaggle.com&#x2F;code&#x2F;anivana&#x2F;deepfloyd-if-playground&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.kaggle.com&#x2F;code&#x2F;anivana&#x2F;deepfloyd-if-playground&#x2F;</a><p>I played with some ready prompts here
marvinkennisabout 2 years ago
Seeing a lot of text-to-image out there recently. Does anyone know what the current state of the art is on image-to-text? Thinking something similar to Midjourney&#x27;s &#x2F;describe command that they added in v5
评论 #35744228 未加载
评论 #35744713 未加载
epivosismabout 2 years ago
There&#x27;s a discord with tons of sample images, where we&#x27;ve been waiting patiently for the release, coming SOON, for 3 months now. <a href="https:&#x2F;&#x2F;discord.gg&#x2F;pxewcvSvNx" rel="nofollow">https:&#x2F;&#x2F;discord.gg&#x2F;pxewcvSvNx</a>
评论 #35721536 未加载
jlsreleafabout 2 years ago
Website design main page. Bright vibrant neon colors of the rainbow slimes, slime business, kid attention grabbing, splashes of bright neon colors. Professional looking Website page, high quality resolution 8k
bulbosaur123about 2 years ago
What are the official and unofficial discords?<p>I found only this one on their subreddit: <a href="https:&#x2F;&#x2F;discord.gg&#x2F;GvsvNrVkk5" rel="nofollow">https:&#x2F;&#x2F;discord.gg&#x2F;GvsvNrVkk5</a>
youssefabdelmabout 2 years ago
Meh, results feel hodge podge like a bunch of models were stitched together
jlsreleafabout 2 years ago
Website design for slime. Professional looking, high-quality, 8k, brightest neon colors of the rainbow slimes, splashes of neon colors in background, kid attention grabbing, eye catching
vitorgrsabout 2 years ago
Tried using right now, and it&#x27;s way better than Stable Diffusion (be it 1.5, 2.1 or SDXL).<p>But is harder to get a good picture. This fine tuned with a good RLHF will be amazing.
评论 #35745033 未加载
etaioinshrdluabout 2 years ago
Does paying Hugggingface to run it on the GPU count as commercial use?
TheBlapseabout 2 years ago
Currently down on hugging face
TheBlapseabout 2 years ago
&quot;Imagen free&quot;