Recently, the most popular modality text-to-image annotations has been preference data, where annotators usually choose between two images two indicate their favorite. While this does work to fine-tune models, it lacks additional information about what might be wrong with the images. E.g., what part of the image is misaligned relative to the prompt.
Google research propose a modality for more information rich annotations (<a href="https://arxiv.org/abs/2312.10240" rel="nofollow">https://arxiv.org/abs/2312.10240</a>).
Based on this, we produced this dataset of ~13k images. We collected in total ~1.5 million annotations from 150k annotators using our annotation API. If you are interested you can learn more about the API at <a href="https://docs.rapidata.ai/" rel="nofollow">https://docs.rapidata.ai/</a><p>Let me know if you have any questions about the dataset or Rapidata in general!