The safety filter of GPT 4 vision and Claude 3 Opus is critical for user facing products, but to build internal tools it impacts a lot performance (eg. to build an image moderation LLM). We had this problem internally and we had to build our own Llama 3 8B with vision (Llava architecture). We are now releasing it as an API for other developers experiencing the same issue with too censored multimodal LLM APIs.