科技回声

4 条评论

>Meissonic, with just 1B parameters, offers comparable or superior 1024×1024 high-resolution, aesthetically pleasing images while being able to run on consumer-grade GPUs with only 8GB VRAM without the need for any additional model optimizations. Moreover, Meissonic effortlessly generates images with solid-color backgrounds, a feature that usually demands model fine-tuning or noise offset adjustments in diffusion models.<p>This looks really cool. Also nice to see another architecture being used for image generation besides diffusion. It seems like every NLP problem can be solved with transformers now: text generation/understanding, image generation/understanding, translation, OCR. Perhaps llama 4/5 will have image generation as well. eidt: llama 3.2 already has image editing, they probably just don't want to release an image generator for other reasons.

mysteria7 个月前

Interesting how pretty much all the example images look like renders/paintings as opposed to photographs. Maybe that's what it's trained on?

littlestymaar7 个月前

> It’s crucial to highlight the resource efficiency of our training process. Our training is considerably more resource-efficient compared to Stable Diffusion (Podell et al., 2023). Meissonic is trained in approximately 48 H100 GPU days<p>From scratch training of an image synthesis model for the price of a graphic card isn't something I expected anytime soon!

jensenbox7 个月前

The images in the PDF are amazing.

Meissonic, High-Resolution Text-to-Image Synthesis on consumer graphics cards

4 条评论

Meissonic, High-Resolution Text-to-Image Synthesis on consumer graphics cards

4 条评论