From: <a href="https://twitter.com/EMostaque/status/1760660709308846135" rel="nofollow">https://twitter.com/EMostaque/status/1760660709308846135</a><p>Some notes:<p>- This uses a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements.<p>- This takes advantage of transformer improvements & can not only scale further but accept multimodal inputs..<p>- Will be released open, the preview is to improve its quality & safety just like og stable diffusion<p>- It will launch with full ecosystem of tools<p>- It's a new base taking advantage of latest hardware & comes in all sizes<p>- Enables video, 3D & more..<p>- Need moar GPUs..<p>- More technical details soon<p>>Can we create videos similar like sora<p>Given enough GPUs and good data yes.<p>>How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able to have fun with it ?<p>Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment.<p>(adding some later replies)<p>>awesome. I assume these aren't heavily cherry picked seeds?<p>No this is all one generation. With DPO, refinement, further improvement should get better.<p>>Do you have any solves coming for driving coherency and consistency across image generations? For example, putting the same dog in another scene?<p>yeah see
@Scenario_gg's great work with IP adapters for example. Our team builds ComfyUI so you can expect some really great stuff around this...<p>>Dall-e often doesn’t even understand negation, let alone complex spatial relations in combination with color assignments to objects.<p>Imagine the new version will. DALLE and MJ are also pipelines, you can pretty much do anything accurately with pipelines now.<p>>Nice. Is it an open-source / open-parameters / open-data model?<p>Like prior SD models it will be open source/parameters after the feedback and improvement phase. We are open data for our LMs but not other modalities.<p>>Cool!!! What do you mean by good data? Can it directly output videos?<p>If we trained it on video yes, it is very much like the arch of sora.