TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

96 点作者 tkgally超过 1 年前

5 条评论

mdrzn超过 1 年前
We're clearly heading towards a "Generalist Multimodal Large Language Model" that autonomously selects the appropriate specialized LLM for any given task, rather than requiring us to switch between multiple LLMs. The combination of a mixture of experts approach and multimodality appears to be the way forward. Very excited for the future.
评论 #39456769 未加载
评论 #39456646 未加载
评论 #39454191 未加载
评论 #39461272 未加载
rustcleaner超过 1 年前
Say I am between sophomore and junior, what's the best way to bootstrap from calc+linalg+stat to being able to competently configure and piece these components together into something like this? Any good lecturers or courses?
justanotherjoe超过 1 年前
What's with the dummy github page? Anyway. This feels like the right step forward. Just like OpenAI, i have near religious faith in the transformers architecture. The question is how these modalities can work together better.
ilaksh超过 1 年前
I feel like this type of capability or architecture might be the future of interactive agents. The quality of the voices leaves a little to be desired, but otherwise it seems very powerful.
评论 #39454860 未加载
cs702超过 1 年前
The key takeaway for me:<p>Whether data is continuous or discrete, no matter its modality (text, video, music, etc.), we now have an array of proven methods for representing it with <i>discrete</i> tokens, enabling us to use existing sequence modeling architectures (Transformers, linear RNNs).<p>We live in interesting times!
评论 #39465191 未加载