I’ve been playing with diffusion a ton for the past few months, writing a new sampler that implements an iterative blending technique described in a recent paper. The latent space is rich in semantic information, so it can be a great place to apply various transformations rather than operating on the image directly. Yet it still has a significant spatial component, so things you do in one spatial area will affect that same area of the image.<p>Stable Diffusion 1.5 may be quite old now, but it is an incredibly rich model that still yields shockingly good results. SDXL is newer and more high tech, but it’s not a revolutionary improvement. It can be less malleable than the older model and harder to work with to achieve a given desired result.
Just a terminology comment here. "Latent space" means a lot of different things in different models. For a GAN for example it actually means the "top concept" space where you can change the entire concept of the image by moving around in the latent space, which is notoriously difficult. For SD/SDXL it refers to the bottommost layer just above pixelspace, which expands the generated image from 64x64 to 512x512 pixels in the case of SD1.5.<p>This allows the rest of the network to be smaller while still generating a usable output resolution, so it's a performance "hack".<p>It's a really good idea to explore it and hack into it like in the article, to "remaster" the image so to speak!
Anyone know if the work shown here has been implemented in Automatic1111 or ComfyUI as an extension? If not, than that might be my first project to add since these are quite simple (relatively speaking) in the code to implement.
That's very cool. I had no idea the latent space was that accessible and obviously manipulatable.<p>Also interesting is how the way sdxl structures latents affects how it thinks about images.
I don't think it's as simple as this naive approach suggests, but it's a good preliminary analysis. It's a good lesson that while being absolutely correct might be quite difficult, diving in and having a go might get you further than you think.
That was an excellently written article.<p>I for sure thought a discussion about latent spaces would instantly be over my head. It was, but took a few paragraphs.