NExT-GPT is the first end-to-end MM-LLM that perceives input and generates output in arbitrary combinations (any-to-any) of text, image, video, and audio and beyond.
I gave it a photo of my face, and it said that I look approachable and friendly. I asked it to improve my hair, and it replaced me with a completely different person. Oh well! Still very impressive tech.