Given that LLMs are adept at handling a variety of traditional NLP tasks like sentiment analysis, named entity recognition, and text classification, it’s interesting to consider whether multimodal (language and vision) LLMs could supplant traditional image classification methods as well. I worked on this project over the long weekend to explore that question and have documented the effort and my findings at the link provided. I'm not a scientist, I just like learning, so don't laugh too hard if this is bogus.