Stable Diffusion 3

983 点作者 reqo大约 1 年前

52 条评论

From: <a href="https://twitter.com/EMostaque/status/1760660709308846135" rel="nofollow">https://twitter.com/EMostaque/status/1760660709308846135</a>Some notes:- This uses a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements.- This takes advantage of transformer improvements & can not only scale further but accept multimodal inputs..- Will be released open, the preview is to improve its quality & safety just like og stable diffusion- It will launch with full ecosystem of tools- It's a new base taking advantage of latest hardware & comes in all sizes- Enables video, 3D & more..- Need moar GPUs..- More technical details soon>Can we create videos similar like soraGiven enough GPUs and good data yes.>How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able to have fun with it ?Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment.(adding some later replies)>awesome. I assume these aren't heavily cherry picked seeds?No this is all one generation. With DPO, refinement, further improvement should get better.>Do you have any solves coming for driving coherency and consistency across image generations? For example, putting the same dog in another scene?yeah see @Scenario_gg's great work with IP adapters for example. Our team builds ComfyUI so you can expect some really great stuff around this...>Dall-e often doesn’t even understand negation, let alone complex spatial relations in combination with color assignments to objects.Imagine the new version will. DALLE and MJ are also pipelines, you can pretty much do anything accurately with pipelines now.>Nice. Is it an open-source / open-parameters / open-data model?Like prior SD models it will be open source/parameters after the feedback and improvement phase. We are open data for our LMs but not other modalities.>Cool!!! What do you mean by good data? Can it directly output videos?If we trained it on video yes, it is very much like the arch of sora.

评论 #39467394 未加载

评论 #39468351 未加载

评论 #39467623 未加载

评论 #39470544 未加载

评论 #39471237 未加载

评论 #39472603 未加载

评论 #39475564 未加载

评论 #39471051 未加载

评论 #39467683 未加载

subzel0大约 1 年前

“Photo of a red sphere on top of a blue cube. Behind them is a green triangle, on the right is a dog, on the left is a cat”<a href="https://pbs.twimg.com/media/GG8mm5va4AA_5PJ?format=jpg&name=large" rel="nofollow">https://pbs.twimg.com/media/GG8mm5va4AA_5PJ?format=jpg&name=...</a>

评论 #39470732 未加载

评论 #39467941 未加载

评论 #39467867 未加载

评论 #39468296 未加载

评论 #39468736 未加载

评论 #39475281 未加载

评论 #39472421 未加载

评论 #39470172 未加载

keiferski大约 1 年前

The obsession with safety in this announcement feels like a missed marketing opportunity, considering the recent Gemini debacle. Isn’t SD’s primary use case the fact that you can install it on your own computer and make what you want to make?

评论 #39474617 未加载

评论 #39467053 未加载

评论 #39471061 未加载

评论 #39467382 未加载

评论 #39473074 未加载

wtcactus大约 1 年前

I notice they are avoiding images of people in the announcement.I wonder if they are afraid of the same debacle as google AI and what they mean by "safety" is actually heavy bias against white people and their culture like what happened with Gemini.

评论 #39480165 未加载

评论 #39477799 未加载

评论 #39472051 未加载

评论 #39477847 未加载

hizanberg大约 1 年前

IMO the "safety" in Stable Diffusion is becoming more overzealous where most of my images are coming back blurred, where I no longer want to waste my time writing a prompt only for it to return mostly blurred images. Prompts that worked in previous versions like portraits are coming back mostly blurred in SDXL.If this next version is just as bad, I'm going to stop using Stability APIs. Are there any other text-to-image services that offer similar value and quality to Stable Diffusion without the overzealous blurring?Edit:Example prompt's like "Matte portrait of Yennefer" return 8/9 blurred images [1][1] <a href="https://imgur.com/a/nIx8GBR" rel="nofollow">https://imgur.com/a/nIx8GBR</a>

评论 #39468627 未加载

评论 #39469838 未加载

评论 #39469230 未加载

评论 #39469127 未加载

评论 #39477838 未加载

评论 #39469096 未加载

评论 #39468493 未加载

robertwt7大约 1 年前

It’ll be interesting to see what “safety” means in this case given the censorship in diffuser models nowadays. Look what’s happening with Gemini, it’s quite scary really how different companies have different censorship valuesI’ve had some fair share of frustation with DallE as well when trying to generate weapon images for game assets. Had to tweak a lot of my prompt

评论 #39471033 未加载

alexb_大约 1 年前

> We believe in safe, responsible AI practices. This means we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors. Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment. In preparation for this early preview, we’ve introduced numerous safeguards. By continually collaborating with researchers, experts, and our community, we expect to innovate further with integrity as we approach the model’s public release.What exactly does this mean? Will we be able to see all of the "safeguards" and access all of the technology's power without someone else's restrictions on them?

评论 #39466928 未加载

评论 #39477925 未加载

评论 #39466943 未加载

miohtama大约 1 年前

No model. Half of the announcement text is “we area really really responsible and safe, believe us.”Kind of a dud for an announcement.

评论 #39470489 未加载

PcChip大约 1 年前

The text/spelling part is a huge step forward

londons_explore大约 1 年前

All the demo images are 'artwork'.will the model also be able to produce good photographs, technical drawings, and other graphical media?

评论 #39467144 未加载

评论 #39467675 未加载

评论 #39471345 未加载

haolez大约 1 年前

Rewriting the "safety" part, but replacing the AI tool with an imaginary knife called Big Knife:"We believe in safe, responsible knife practices. This means we have taken and continue to take reasonable steps to prevent the misuse of Big Knife by bad actors."

bsaul大约 1 年前

Anyone knows which AI could be used to generate UI design elements ? (such as "generate a real estate app widget list") as well as the kind of prompts one would use to obtain good results ?I'm only now investigating using AI to increase velocity in my projects, and the field is moving so fast, i'm a bit outdated.

评论 #39469203 未加载

评论 #39468868 未加载

willsmith72大约 1 年前

at this point perfect text would be a gamechanger if it can be solvedmidjourney 6 can be completely photorealistic and include valid text, but also sometimes adds bad text. it's not much, but having to use an image editor for that is still annoying. for creating marketing material, getting perfect text every time and never getting bad text would be amazing

评论 #39466981 未加载

amelius大约 1 年前

Does anyone know of a good tutorial on how diffusion models work?

评论 #39468061 未加载

评论 #39467160 未加载

评论 #39467038 未加载

SubiculumCode大约 1 年前

It is interesting to me that these diffusion image models are so much smaller than the LLMs.

btbuildem大约 1 年前

That's nice, but could we please have an unsafe alternative? I would like to footgun both my legs off, thank you.

评论 #39471368 未加载

评论 #39467909 未加载

评论 #39467525 未加载

评论 #39467852 未加载

评论 #39467548 未加载

101008大约 1 年前

What's the best way to use SD (3 or 2) online? I can't run it on my PC and I want to do some experiments to generate assets for a POC videogame I'm working on. I pay MidJOurney and I woulnd't mind pay something like 5 or 10 dollars per month to experiment with SD, but I can't find anything.

评论 #39467782 未加载

评论 #39468451 未加载

lreeves大约 1 年前

People in this discussion seem to be hand-wringing about Stability's "saftey" comments but every model they've released has been fine tuned for porn in like 24 hours.

评论 #39467097 未加载

redder23大约 1 年前

Horrible website, hijacks scrolling. I have my scrolling speed up with Chromium Wheel Smooth Scroller. This website's scrolling is extremely slow, so the extension is not working because they are "doing it wrong" TM and somehow hijack native scrolling and do something with it.

treesciencebot大约 1 年前

Quite nice to see diffusion transformers [0] becoming the next dominant architecture on the generative media.[0]: <a href="https://twitter.com/EMostaque/status/1760660709308846135" rel="nofollow">https://twitter.com/EMostaque/status/1760660709308846135</a>

kbumsik大约 1 年前

So there is no license information yet?

ssalka大约 1 年前

I wonder if this will actually be adopted by the community, unlike SD. 2.0. Many are still developing around SD 1.5 due to its uncensored nature. SDXL has done better than 2.0, but has greater hardware requirements so still can't be used by everyone running 1.5.

the_duke大约 1 年前

So, they just announced StableCascade.Wouldn't this v3 supersede the StableCascade work?Did they announce it because a team had been working on it and they wanted to push it out to not just lose it as an internal project, or are there architectural differences that make both worthwile?

评论 #39467862 未加载

评论 #39468453 未加载

评论 #39474024 未加载

ummonk大约 1 年前

It's going to have a restrictive license like Stable Cascade no doubt.

评论 #39474824 未加载

pama大约 1 年前

I wish they put out the report already. Has anyone else published a preprint combining ideas similar to diffusion transformers and flow matching?

评论 #39467932 未加载

pqdbr大约 1 年前

The sample images are absolutely stunning.Also, I was blown away by the "Stable Diffusion" written on the side of the bus.

评论 #39467311 未加载

FloatArtifact大约 1 年前

I'm curious to know if they're safeguards are eliminated when users find tune the model?

评论 #39467210 未加载

iterateAutomate大约 1 年前

What is with these names haha, Stable Diffusion XL 1.0 and now to Stable Diffusion 3??

评论 #39471048 未加载

评论 #39471305 未加载

glimshe大约 1 年前

This reinforces my impression that Google is at least one year behind. Stunning images, 3D, video while Gemini had to be partially halted this morning.

评论 #39469620 未加载

评论 #39467063 未加载

评论 #39471110 未加载

gat1大约 1 年前

I guess we do not know anything about the training dataset ?

评论 #39467968 未加载

评论 #39466847 未加载

评论 #39466927 未加载

londons_explore大约 1 年前

I really wonder what harm would come to the company if they didn't talk about safety?Would investors stop giving them money? Would users sue that they now had PTSD after looking at all the 'unsafe' outputs? Would regulators step in and make laws banning this 'unsafe' AI?What is it specifically that company management is worried about?

评论 #39467241 未加载

评论 #39467268 未加载

评论 #39467387 未加载

评论 #39467786 未加载

评论 #39470597 未加载

评论 #39471140 未加载

评论 #39469692 未加载

评论 #39468557 未加载

评论 #39470488 未加载

caycep大约 1 年前

are all the model/back ends to Stability products basically available OSS via Ludwig Maximilian University, more or less?

spywaregorilla大约 1 年前

Impressive text in the images.

aussieguy1234大约 1 年前

How does it go with fingers?

panzi大约 1 年前

503 Service Unavailable welp

coldcode大约 1 年前

No details in the announcement, is it still pixel size in = pixel size out?

declan_roberts大约 1 年前

Can it generate an image of people without injecting insufferable diversity quotas into each image? If so then it’s the most advanced model on the internet right now!

satisfice大约 1 年前

Can it make a picture of a woman chasing a bear?The old one can't.

评论 #39467501 未加载

评论 #39466906 未加载

animex大约 1 年前

Ugh, another startup(?) requiring Discord to use their product. :(

评论 #39472121 未加载

sjm大约 1 年前

The example images look so bad. Absolutely zero artistic value.

评论 #39468830 未加载

评论 #39468993 未加载

inference-lord大约 1 年前

Cool but it's hard to keep getting "blown away" at this stage. The "incredible" is routine now.

评论 #39467585 未加载

评论 #39467171 未加载

poulpy123大约 1 年前

Didn't they released another model few days ago ?

4bpp大约 1 年前

I guess we should count our blessings and be grateful that literacy, the printing press, computers and the internet became normalised before this notion of "harm" and harm prevention was. Going forward, it's hard to imagine how any new technology that is unconditionally intellectually empowering to the individual will be tolerated; after all, just think of the harms someone thus empowered could be enabled to perpetrate.Perhaps eventually, once every forum has been assigned a trust-and-safety team and word processor has been aligned and most normal people have no need for communication outside the Metaverse (TM) in their daily lives, we will also come around to reviewing the necessity of teaching kids to write, considering the epidemic of hateful graffiti and children being caught with handwritten sexualised depictions of their classmates.

评论 #39468864 未加载

评论 #39468973 未加载

评论 #39468536 未加载

评论 #39468980 未加载

评论 #39468831 未加载

评论 #39470283 未加载

评论 #39469839 未加载

评论 #39471886 未加载

ametrau大约 1 年前

“Safety” = safe to our reputation. It’s insulting how they imply safety from “harm”.

评论 #39470075 未加载

评论 #39469757 未加载

评论 #39468754 未加载

patates大约 1 年前

Half of the announcement talks about safety. The next step will be these control mechanisms being built into all sorts of software I suppose.It's "safe" for them, not for the users, at least they should make that clear.

评论 #39467557 未加载

评论 #39467189 未加载

评论 #39467072 未加载

评论 #39467355 未加载

评论 #39467345 未加载

评论 #39467075 未加载

评论 #39467875 未加载

评论 #39468747 未加载

评论 #39467181 未加载

评论 #39467234 未加载

评论 #39467228 未加载

评论 #39468776 未加载

评论 #39468196 未加载

评论 #39470150 未加载

评论 #39467556 未加载

评论 #39467367 未加载

13of40大约 1 年前

"we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors"It's kind of a testament to our times that the person who chooses to look at synthetic porn instead of supporting a real-life human trafficking industry is the bad actor.

评论 #39469350 未加载

评论 #39469337 未加载

评论 #39469503 未加载

评论 #39471375 未加载

评论 #39469785 未加载

评论 #39469794 未加载

123yawaworht456大约 1 年前

>This preview phase, as with previous models, is crucial for gathering insights to improve its performance and safety ahead of an open release.oh, for fuck's sake.

评论 #39466979 未加载

AuryGlenz大约 1 年前

It's really unfortunate that Silicon Valley ended up in an area that's so far left - and to be clear, it'd be just as bad if it was in a far right area too. Purple would have been nice, to keep people in check. 'Safety' seems to be actively making AI advances worse.

评论 #39470689 未加载

评论 #39470901 未加载

评论 #39468569 未加载

评论 #39469117 未加载

评论 #39468393 未加载

deepsdev大约 1 年前

Can we use it create SORA like videos?

评论 #39468067 未加载

评论 #39468691 未加载

k__大约 1 年前

So, they block all bad actors, but themselves?

cuckatoo大约 1 年前

NSFW fine tune when? Or will "safety" win this time?

评论 #39467444 未加载

GenericPoster大约 1 年前

The talk of "safety" and harm in every image or language model release is getting quite boring and repetitive. The reasons why it's there is obvious and there are known workarounds yet the majority of conversations seems to be dominated by it. There's very little discussion regarding the actual technology and I'm aware of the irony of mentioning this. Really wish I could filter out these sorts of posts.Hopefuly it dies down soon but I doubt it. At least we don't have to hear garbage about "WHy doEs opEn ai hAve oPEn iN thE namE iF ThEY aReN'T oPEN SoURCe"

评论 #39470948 未加载

52 条评论

JonathanFly大约 1 年前

评论 #39467394 未加载

评论 #39468351 未加载

评论 #39467623 未加载

评论 #39470544 未加载

评论 #39471237 未加载

评论 #39472603 未加载

评论 #39475564 未加载

评论 #39471051 未加载

评论 #39467683 未加载

subzel0大约 1 年前

评论 #39470732 未加载

评论 #39467941 未加载

评论 #39467867 未加载

评论 #39468296 未加载

评论 #39468736 未加载

评论 #39475281 未加载

评论 #39472421 未加载

评论 #39470172 未加载

keiferski大约 1 年前

评论 #39474617 未加载

评论 #39467053 未加载

评论 #39471061 未加载

评论 #39467382 未加载

评论 #39473074 未加载

wtcactus大约 1 年前

评论 #39480165 未加载

评论 #39477799 未加载

评论 #39472051 未加载

评论 #39477847 未加载

hizanberg大约 1 年前

评论 #39468627 未加载

评论 #39469838 未加载

评论 #39469230 未加载

评论 #39469127 未加载

评论 #39477838 未加载

评论 #39469096 未加载

评论 #39468493 未加载

robertwt7大约 1 年前

评论 #39471033 未加载

alexb_大约 1 年前

评论 #39466928 未加载

评论 #39477925 未加载

评论 #39466943 未加载

miohtama大约 1 年前

No model. Half of the announcement text is “we area really really responsible and safe, believe us.”Kind of a dud for an announcement.

评论 #39470489 未加载

PcChip大约 1 年前

The text/spelling part is a huge step forward

londons_explore大约 1 年前

All the demo images are 'artwork'.will the model also be able to produce good photographs, technical drawings, and other graphical media?

评论 #39467144 未加载

评论 #39467675 未加载

评论 #39471345 未加载

haolez大约 1 年前

bsaul大约 1 年前

评论 #39469203 未加载

评论 #39468868 未加载

willsmith72大约 1 年前

评论 #39466981 未加载

amelius大约 1 年前

Does anyone know of a good tutorial on how diffusion models work?

评论 #39468061 未加载

评论 #39467160 未加载

评论 #39467038 未加载

SubiculumCode大约 1 年前

It is interesting to me that these diffusion image models are so much smaller than the LLMs.

btbuildem大约 1 年前

That's nice, but could we please have an unsafe alternative? I would like to footgun both my legs off, thank you.

评论 #39471368 未加载

评论 #39467909 未加载

评论 #39467525 未加载

评论 #39467852 未加载

评论 #39467548 未加载

101008大约 1 年前

评论 #39467782 未加载

评论 #39468451 未加载

lreeves大约 1 年前

People in this discussion seem to be hand-wringing about Stability's "saftey" comments but every model they've released has been fine tuned for porn in like 24 hours.

评论 #39467097 未加载

redder23大约 1 年前

treesciencebot大约 1 年前

kbumsik大约 1 年前

So there is no license information yet?

ssalka大约 1 年前

the_duke大约 1 年前

评论 #39467862 未加载

评论 #39468453 未加载

评论 #39474024 未加载

ummonk大约 1 年前

It's going to have a restrictive license like Stable Cascade no doubt.

评论 #39474824 未加载

pama大约 1 年前

I wish they put out the report already. Has anyone else published a preprint combining ideas similar to diffusion transformers and flow matching?

评论 #39467932 未加载

pqdbr大约 1 年前

The sample images are absolutely stunning.Also, I was blown away by the "Stable Diffusion" written on the side of the bus.

评论 #39467311 未加载

FloatArtifact大约 1 年前

I'm curious to know if they're safeguards are eliminated when users find tune the model?

评论 #39467210 未加载

iterateAutomate大约 1 年前

What is with these names haha, Stable Diffusion XL 1.0 and now to Stable Diffusion 3??

评论 #39471048 未加载

评论 #39471305 未加载

glimshe大约 1 年前

This reinforces my impression that Google is at least one year behind. Stunning images, 3D, video while Gemini had to be partially halted this morning.

评论 #39469620 未加载

评论 #39467063 未加载

评论 #39471110 未加载

gat1大约 1 年前

I guess we do not know anything about the training dataset ?

评论 #39467968 未加载

评论 #39466847 未加载

评论 #39466927 未加载

londons_explore大约 1 年前

评论 #39467241 未加载

评论 #39467268 未加载

评论 #39467387 未加载

评论 #39467786 未加载

评论 #39470597 未加载

评论 #39471140 未加载

评论 #39469692 未加载

评论 #39468557 未加载

评论 #39470488 未加载

caycep大约 1 年前

are all the model/back ends to Stability products basically available OSS via Ludwig Maximilian University, more or less?

spywaregorilla大约 1 年前

Impressive text in the images.

aussieguy1234大约 1 年前

How does it go with fingers?

panzi大约 1 年前

503 Service Unavailable welp

coldcode大约 1 年前

No details in the announcement, is it still pixel size in = pixel size out?

declan_roberts大约 1 年前

Can it generate an image of people without injecting insufferable diversity quotas into each image? If so then it’s the most advanced model on the internet right now!

satisfice大约 1 年前

Can it make a picture of a woman chasing a bear?The old one can't.

评论 #39467501 未加载

评论 #39466906 未加载

animex大约 1 年前

Ugh, another startup(?) requiring Discord to use their product. :(

评论 #39472121 未加载

sjm大约 1 年前

The example images look so bad. Absolutely zero artistic value.

评论 #39468830 未加载

评论 #39468993 未加载

inference-lord大约 1 年前

Cool but it's hard to keep getting "blown away" at this stage. The "incredible" is routine now.

评论 #39467585 未加载

评论 #39467171 未加载

poulpy123大约 1 年前

Didn't they released another model few days ago ?

4bpp大约 1 年前

评论 #39468864 未加载

评论 #39468973 未加载

评论 #39468536 未加载

评论 #39468980 未加载

评论 #39468831 未加载

评论 #39470283 未加载

评论 #39469839 未加载

评论 #39471886 未加载

ametrau大约 1 年前

“Safety” = safe to our reputation. It’s insulting how they imply safety from “harm”.

评论 #39470075 未加载

评论 #39469757 未加载

评论 #39468754 未加载

patates大约 1 年前

评论 #39467557 未加载

评论 #39467189 未加载

评论 #39467072 未加载

评论 #39467355 未加载

评论 #39467345 未加载

评论 #39467075 未加载

评论 #39467875 未加载

评论 #39468747 未加载

评论 #39467181 未加载

评论 #39467234 未加载

评论 #39467228 未加载

评论 #39468776 未加载

评论 #39468196 未加载

评论 #39470150 未加载

评论 #39467556 未加载

评论 #39467367 未加载

13of40大约 1 年前

评论 #39469350 未加载

评论 #39469337 未加载

评论 #39469503 未加载

评论 #39471375 未加载

评论 #39469785 未加载

评论 #39469794 未加载

123yawaworht456大约 1 年前

>This preview phase, as with previous models, is crucial for gathering insights to improve its performance and safety ahead of an open release.oh, for fuck's sake.