I wanted to document a particular genAI antipattern which I've seen a few times now.<p>LLMs are theoretically pretty fungible, because you send English and get English back--but in practice you still need to do some amount of technical due diligence before swapping model. These things are benchmarked on tasks which rarely resemble your specific use case. Blindly swap models at your own risk!<p>Something that has become very clear since the advent of GPT-3.5 is that LLMs are far from magic, and using them does not remove the need for good engineering fundamentals. It's important to have a solid eval suite so you can quickly benchmark your system against different LLMs, because the APIs we're all building on are constant moving targets.