This looks so cool, and from reading the Hugging Face model card it should be easy enough to run. I do almost all of my work with text, NLP, IR, etc., and I have wanted to try multi-modal models. I just bookmarked the model card page.<p>I am also getting even more excited by the explosion of work on open models. I still haven’t adjusted to how good mistral-7B is, and it runs on my Mac without breaking a sweat.