TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Meissonic, High-Resolution Text-to-Image Synthesis on consumer graphics cards

65 点作者 jinqueeny7 个月前

4 条评论

fngjdflmdflg7 个月前
&gt;Meissonic, with just 1B parameters, offers comparable or superior 1024×1024 high-resolution, aesthetically pleasing images while being able to run on consumer-grade GPUs with only 8GB VRAM without the need for any additional model optimizations. Moreover, Meissonic effortlessly generates images with solid-color backgrounds, a feature that usually demands model fine-tuning or noise offset adjustments in diffusion models.<p>This looks really cool. Also nice to see another architecture being used for image generation besides diffusion. It seems like every NLP problem can be solved with transformers now: text generation&#x2F;understanding, image generation&#x2F;understanding, translation, OCR. Perhaps llama 4&#x2F;5 will have image generation as well. eidt: llama 3.2 already has image editing, they probably just don&#x27;t want to release an image generator for other reasons.
mysteria7 个月前
Interesting how pretty much all the example images look like renders&#x2F;paintings as opposed to photographs. Maybe that&#x27;s what it&#x27;s trained on?
littlestymaar7 个月前
&gt; It’s crucial to highlight the resource efficiency of our training process. Our training is considerably more resource-efficient compared to Stable Diffusion (Podell et al., 2023). Meissonic is trained in approximately 48 H100 GPU days<p>From scratch training of an image synthesis model for the price of a graphic card isn&#x27;t something I expected anytime soon!
jensenbox7 个月前
The images in the PDF are amazing.