I like to explore hidden capabilities of language models (LLMs), and I've been exploring how good text generation models are at media generation .<p>In this case, the experiment was, can we hack together a text-to-image/animation app from basic text generation models (not image generation/diffusion models)?<p>Turns out Claude API is pretty useful for generating images and animations. My theory is Claude is really good with XML (the underlying format for SVG) and thus enabling this specific use case. I haven’t tried with other LLMs but didn’t have much luck with Gemini so far.<p>Code open source, and demos on my repo: <a href="https://github.com/notnotrishi/vignette/">https://github.com/notnotrishi/vignette/</a><p>Feel free to try it for yourself or leave any notes. Thanks!