Somewhat tangential, but I hadn't heard about the Emu model, which was apparently released (the paper [1] at least) in September. I was curious about the details and read the Emu paper and ... I feel like I'm taking crazy pills reading it.<p>> To the best of our knowledge, this is the first work highlighting fine-tuning for generically promoting aesthetic alignment for a wide range of visual domains.<p>... unlike Stable Diffusion which did aesthetic fine tuning when it was released? Or like the thousands of aesthetic finetunes released since?<p>> We show that the original 4-channel autoencoder design [27] is unable to reconstruct fine details. Increasing channel size leads to much better reconstructions.<p>Is it not expected that decreasing the compression ratio would lead to better reconstructions? The whole point of the latent diffusion architecture is to make a trade-off here. They're more than welcome to do pixel diffusion if they want better quality, or upscaling architecture.<p>And then the rest of the paper is this long documentation that can be summed up as "we used industry standard filtering and then human filtering to build an aesthetic dataset which we finetuned a model with". Which, again, has been done a thousand times already.<p>I really, really don't mean to knock the researcher's work here. I'm just very confused as to why the work is being represented as new or groundbreaking. Contrast to OAI which documents using a diffusion based latent decoder. That's interesting, different, and worth publishing. Scaling up your latent space to get better results is just ... obvious? (As obvious as anything in ML is, anyway). Facebook's research isn't usually this off the mark. E.g. the Emu Edit paper is very interesting and contributes many new methods to the field.<p>[1] <a href="https://scontent-lax3-1.xx.fbcdn.net/v/t39.2365-6/10000000_1099397624548149_16002132581482810_n.pdf?_nc_cat=110&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=mxfter4gnLgAX_0FnFD&_nc_ht=scontent-lax3-1.xx&oh=00_AfByfMkAByxJGLImPcGtMiBQtMsLU0e1ksDLvyqJW7yaPA&oe=655B1F8F" rel="nofollow noreferrer">https://scontent-lax3-1.xx.fbcdn.net/v/t39.2365-6/10000000_1...</a>