I thought OpenAI had been sleeping a bit and given up on the image generation race. However, with the recent release of the oddly named 4o Image Generation model (Why not continue with the DALL-E naming scheme? I personally love the pun) it looks like they cooked up one hell of a model.<p>If you are into visual GenAI you have probably already seen many examples of quite incredible outputs from the new model. However, we decided that we wanted to make a large scale evaluation, based on 200k human responses across 13k image pairings.<p>Unfortunately that also meant that we had to generate a large amount of new images, and since OpenAI have not yet opened up API access, we had to do it manually through the UI :(.<p>The benchmark tests the model in coherence, prompt-alignment, and overall aesthetic preference. Especially for the first two, OpenAI's new model is very far ahead of the competition.<p>Check out the detailed results and the collected data which is openly available on huggingface!<p>Let me know if you have questions or feedback!