TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Stable Diffusion 3: Research Paper

503 pointsby edabout 1 year ago

9 comments

edshiroabout 1 year ago
This is really exciting to see. I applaud Stability AI&#x27;s commitment to open source and hope they can operate for as long as possible.<p>There was one thing I was curious about... I skimmed through the executive summary of the paper but couldn&#x27;t find it. Does Stable Diffusion 3 still use CLIP from Open AI for tokenization and text embeddings? I would naively assume that they would try to improve on this part of the model&#x27;s architecture to improve adherence to text and image prompts.
评论 #39604991 未加载
评论 #39604489 未加载
whywhywhywhyabout 1 year ago
It&#x27;s impressive that it spell words correctly and lay them out but the issue I have is the text always has this distinctively overly fried look to it. The color of the text is always ramped up to a single value which when placed into a high fidelity image gives the impression of just slapping some text on top with photoshop afterwards in quite an amateurish fashion rather than text properly integrated into an image.
评论 #39602125 未加载
评论 #39602881 未加载
评论 #39601625 未加载
评论 #39602462 未加载
finnjohnsen2about 1 year ago
Question is, will SD3 be downloadable? I downloaded and run the early SD locally and it is really great.<p>Or did we lose Stable Diffusion to SAAS also? Like we did on many of the LLMs which started of so promising as for self hosting goes
评论 #39601727 未加载
评论 #39605368 未加载
评论 #39602690 未加载
评论 #39609235 未加载
TheAceOfHeartsabout 1 year ago
It&#x27;s very exciting to see that image generators are finally figuring out spelling. When DALL-E 3 (?) came out they hyped up spelling capabilities but when I tried it with Bing it was incredibly inconsistent.<p>I&#x27;d love to read a less technical writeup explaining the challenges faced and why it took so long to figure out spelling. Scrolling through the paper is a bit overwhelming and it goes beyond my current understanding of the topic.<p>Does anyone know if it would be possible to eventually take older generated images with garbled up text + their prompt and have SD3 clean it up or fix the text issues?
评论 #39608101 未加载
评论 #39607241 未加载
评论 #39602033 未加载
评论 #39605377 未加载
评论 #39602873 未加载
edwcrossabout 1 year ago
Nice improvements in text rendering, but it seems generating hands and fingers is still difficult for SD3. None of the pictures in the example contain human hands, except for the pixelized wizard; and the monkey hands seem a bit odd.
评论 #39609621 未加载
vessenesabout 1 year ago
This looks great, very exciting. The paper is not a lot more detailed than the blog. The main Thing about the paper is they have an architecture that can include more expressive text encoders (t5-xxl here), they show this helps with complex scenes, and it seems clear they haven’t maxed out this stack in terms of training. So, expect sd3.1 to be better than this, and expect 4 to be able to work with video through adding even more front end encoding. Exciting!
liuliuabout 1 year ago
This arch seems to be flexible enough to extends to video easily. Hopefully what we have here will be another &quot;foundation&quot; blocks like the transformer blocks in LLaMA.<p>Why:<p>It looks generic enough to incorporated text encoding &#x2F; timestep condition into the block in all the imaginable ways (rather than in limited ways in SDXL &#x2F; SD v1, or Stable Cascade). I don&#x27;t think there is much left to be done there other than to play with positional encoding (2D RoPE?).<p>Great job! Now let&#x27;s just scale up the transformers and focus on quantization &#x2F; optimizations to run this stack properly everywhere :)
评论 #39607788 未加载
WiSaGaNabout 1 year ago
More and more companies that were once devoted to being &#x27;open&#x27;, or were previously open, are now becoming increasingly closed. I appreciate Stability AI releases these research papers.
评论 #39602668 未加载
评论 #39606738 未加载
评论 #39608932 未加载
评论 #39602694 未加载
nojvekabout 1 year ago
He! in contrast to Stability AI, Open AI is the least closed AI lab. Even Deep Mind publishes more papers.<p>I wonder if anyone in Open AI openly says it &quot;We&#x27;re in for the money!&quot;<p>The recent letter by SamA regarding Elon&#x27;s trial had as much truth as Putin saying they are invading Ukraine for de-nazification.