TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Janus: Decoupling visual encoding for multimodal understanding and generation

36 pointsby jinqueeny7 months ago

3 comments

josh-sematic7 months ago
Interesting! It seems to be that there would be a tradeoff between specialist subsystems (which allow you to excel at the specialized tasks, but which can't handle things outside the specialization well) and generalized subsystems (which allow you to integrate information across multiple specializations but which may not be great at any of them). Ultimately you likely need a mix of both, but it's not obvious to me how you would identify when it will be beneficial to "hard code" separations for different subsystems (as is done here for image generation & encoding) vs when the model should be left to "figure it out" during training and implicitly develop the appropriate subsystems within the network.
wiz21c7 months ago
The online demo returns "Error" :-( My prompt was a picture and the question was "what is written on that screenshot" ?
评论 #41911479 未加载
jadbox7 months ago
Does anyone know how Janus compares with rhymes-ai Aria model?