I don't think LLMs are mature enough as a technology to be blindly used as a dependency, and they might never be.<p>The big question is, how do you train LLMs that are useful to both humans and services while not embarrassing the company that trained them?<p>LLMs are pretty good at translating - but if they don't like what they're reading, they simply won't tell you what it says. Which is pretty crazy.<p>LLMs are pretty good at extracting data and formatting the results as JSON - unless they find the data objectionable, then they'll basically complain to the deserializer. I have to admit that's a little bit funny.<p>Right now, if you want to build a service and expect any sort of predictability and stability, I think you have to go with some solution that lets you run open-weights models. Some have been de-censored by volunteers, and if you find one that works for you, you can ignore future "upgrades" until you find one that doesn't break anything.<p>And for that it's really important to write your own tests/benchmarks. Technically the same goes for the big closed LLM services too, but when all of them fail your tests, what will you do?