TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Janus: Decoupling visual encoding for multimodal understanding and generation

36 点作者 jinqueeny7 个月前

3 条评论

josh-sematic7 个月前
Interesting! It seems to be that there would be a tradeoff between specialist subsystems (which allow you to excel at the specialized tasks, but which can't handle things outside the specialization well) and generalized subsystems (which allow you to integrate information across multiple specializations but which may not be great at any of them). Ultimately you likely need a mix of both, but it's not obvious to me how you would identify when it will be beneficial to "hard code" separations for different subsystems (as is done here for image generation & encoding) vs when the model should be left to "figure it out" during training and implicitly develop the appropriate subsystems within the network.
wiz21c7 个月前
The online demo returns "Error" :-( My prompt was a picture and the question was "what is written on that screenshot" ?
评论 #41911479 未加载
jadbox7 个月前
Does anyone know how Janus compares with rhymes-ai Aria model?