Marvin has quite a few multi-modal examples:<p><a href="https://news.ycombinator.com/item?id=39036670">https://news.ycombinator.com/item?id=39036670</a><p>PedanticAI can do multi-modal, too:<p><a href="https://dev.to/stephenc222/how-to-use-pydanticai-for-structured-outputs-with-multimodal-llms-3j3a" rel="nofollow">https://dev.to/stephenc222/how-to-use-pydanticai-for-structu...</a>