Wondering what they are doing better than Stable diffusion to generate the images. It seems like small prompts behave better on Midjourney, I assume that's because they have some kind of prompt expander in the backend?
No one knows what happens in the proprietary pipeline but the model is probably just bigger but not condictioned on T5 embeddings (as the model really can't do text).