I haven't read much about how these systems work yet, so this is probably a novice question, but I'd be interested to hear more about how the algorithm handles text input and feeds it into the generator. Does the training process include a ton of tagged images or something, and then the model learns to be able to generate stuff that corresponds reasonably to those tags?
How that is different with DALLE ?
<a href="https://openai.com/blog/dall-e/" rel="nofollow">https://openai.com/blog/dall-e/</a>