科技回声

3 条评论

Interesting! It seems to be that there would be a tradeoff between specialist subsystems (which allow you to excel at the specialized tasks, but which can't handle things outside the specialization well) and generalized subsystems (which allow you to integrate information across multiple specializations but which may not be great at any of them). Ultimately you likely need a mix of both, but it's not obvious to me how you would identify when it will be beneficial to "hard code" separations for different subsystems (as is done here for image generation & encoding) vs when the model should be left to "figure it out" during training and implicitly develop the appropriate subsystems within the network.

wiz21c7 个月前

The online demo returns "Error" :-( My prompt was a picture and the question was "what is written on that screenshot" ?

评论 #41911479 未加载

jadbox7 个月前

Does anyone know how Janus compares with rhymes-ai Aria model?

Janus: Decoupling visual encoding for multimodal understanding and generation

3 条评论

Janus: Decoupling visual encoding for multimodal understanding and generation

3 条评论