Extremely curious that PaLM-E, PaLI, and GPT-4 were trained to be multimodal (accept non-text inputs, such as images) but the released API's are text-only. In GCP's case, here, they've released PaLM-2 which is not multimodal like PaLM-E and PaLI. This prevents using it for visual reasoning[0].<p>I'm just wondering why multiple parties seem reluctant to allow the public to use this.<p>0: <a href="https://visualqa.org" rel="nofollow">https://visualqa.org</a>