Great job. I recently had a similar idea while reading the sci-fi series <i>The Expanse</i> - I read in English and since it's not my first language, I am often confused by descriptions of spaceships, planets, constructions, etc. I have problems visualising it.<p>I had an idea to use RAG to extract all relevant descriptions of given object and compile them to a detailed description. The description would be fed to a text-to-image model.<p>Have you considered something similar? It would be harder to implement, but the results could be more precise and it would be possible to cover books GPT-4 is not familiar with.
Well done. What are your thoughts on a tool for generalizing the idea of linking a multiplicity of models? In this case, you linked: GPT-4 -> Midjourney. A generalized example would be: Model-A -> Model-B -> Model-C etc.