TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Muse: Text-to-Image Generation via Masked Generative Transformers

79 pointsby jasondaviesover 2 years ago

8 comments

andybakover 2 years ago
Sigh. At this point - if I can't try it out, then I don't really care. It's just a tease.
CuriouslyCover 2 years ago
Fidelity on the output isn&#x27;t great, but the coherence (assuming the examples weren&#x27;t massively cherry-picked) seems very good. Given the number of parameters this should be able to run on end-user machines, and in theory this could be fine tuned to produce better looking output than stable diffusion&#x2F;etc.<p>What this model does more than anything else is demonstrate we&#x27;re still in the early stages of generative models, and we can expect a lot of progress from architectural improvements over the next decade (in addition to the progress in compute and data that we&#x27;re already counting on).
mikemokaover 2 years ago
Here is an available implementation:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;lucidrains&#x2F;muse-maskgit-pytorch">https:&#x2F;&#x2F;github.com&#x2F;lucidrains&#x2F;muse-maskgit-pytorch</a>
评论 #34418345 未加载
Garlefover 2 years ago
It&#x27;d be interesting to see some results where the training set has higher artistic quality (and how this model influences the &quot;house style&quot;). The output does not look great when compared to what other (trained) models deliver.<p>But the promise of a big efficieny gain will be an incentive for companies like midjourney to give it a go with their data.
seydorover 2 years ago
More amazement . I wonder where this field will end up. Cute animal and nature images are nice but have limited real-life use (i mean, we have to accept that visual media ends after everyone can be an artist). I wonder when we &#x27;ll start interfacing language models with robotics to do some real-life work
评论 #34421293 未加载
评论 #34418413 未加载
deepsquirrelnetover 2 years ago
&gt; Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations;<p>Am I wrong or is that the same architecture as DALL-E 1?
pr337h4mover 2 years ago
Would stuff like DreamBooth and textual inversion be usable with transformer models like this one?<p><a href="https:&#x2F;&#x2F;dreambooth.github.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;dreambooth.github.io&#x2F;</a> <a href="https:&#x2F;&#x2F;textual-inversion.github.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;textual-inversion.github.io&#x2F;</a>
kleibaover 2 years ago
Please stop teasing and post the link to your free trial web interface. Please?
评论 #34416765 未加载