>Meissonic, with just 1B parameters, offers comparable or superior 1024×1024 high-resolution, aesthetically pleasing images while being able to run on consumer-grade GPUs with only 8GB VRAM without the need for any additional model optimizations. Moreover, Meissonic effortlessly generates images with solid-color backgrounds, a feature that usually demands model fine-tuning or noise offset adjustments in diffusion models.<p>This looks really cool. Also nice to see another architecture being used for image generation besides diffusion. It seems like every NLP problem can be solved with transformers now: text generation/understanding, image generation/understanding, translation, OCR. Perhaps llama 4/5 will have image generation as well. eidt: llama 3.2 already has image editing, they probably just don't want to release an image generator for other reasons.
Interesting how pretty much all the example images look like renders/paintings as opposed to photographs. Maybe that's what it's trained on?
> It’s crucial to highlight the resource efficiency of our training process. Our training is considerably more resource-efficient compared to Stable Diffusion (Podell et al., 2023). Meissonic is trained in approximately 48 H100 GPU days<p>From scratch training of an image synthesis model for the price of a graphic card isn't something I expected anytime soon!