Has anyone ever tried to run (multimodal) semantic search in ONNX? (or alternative, i.e. mobile/JS)
Do you think multimodal models will increase in size over time? (even less likely to run on edge?)<p>For example, https://arxiv.org/abs/2112.05253 seems to go in the direction of bigger models
Why language seems to imply large models but image models seem usually smaller?