科技回声

We're clearly heading towards a "Generalist Multimodal Large Language Model" that autonomously selects the appropriate specialized LLM for any given task, rather than requiring us to switch between multiple LLMs. The combination of a mixture of experts approach and multimodality appears to be the way forward. Very excited for the future.

Say I am between sophomore and junior, what's the best way to bootstrap from calc+linalg+stat to being able to competently configure and piece these components together into something like this? Any good lecturers or courses?

What's with the dummy github page? Anyway. This feels like the right step forward. Just like OpenAI, i have near religious faith in the transformers architecture. The question is how these modalities can work together better.

I feel like this type of capability or architecture might be the future of interactive agents. The quality of the voices leaves a little to be desired, but otherwise it seems very powerful.

The key takeaway for me:<p>Whether data is continuous or discrete, no matter its modality (text, video, music, etc.), we now have an array of proven methods for representing it with <i>discrete</i> tokens, enabling us to use existing sequence modeling architectures (Transformers, linear RNNs).<p>We live in interesting times!

I feel like this type of capability or architecture might be the future of interactive agents. The quality of the voices leaves a little to be desired, but otherwise it seems very powerful.

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

5 条评论

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

5 条评论