> And BOOM! 100%(!) accuracy against our test suite with just 2 prompt tries. ... OK, so I'm super happy with the accuracy and almost ready to ship it. ... Wawaweewah! ... letting me actually deploy this in production ...<p>This feels like extreme overconfidence in the LLM, sort of how I felt the first time I used one.<p>How many times did they run the test suite? How thorough is the test suite? How much does accuracy matter here, anyway? (seems like it does matter or they wouldn't advertise 100% accuracy and point out edge cases)<p>In my experience, LLMs will hallucinate on not only the correctness and consistency of answers but also the format of their response, whether it be JSON or "Yes/No". If LLMs didn't hallucinate JSON, there'd be no need for posts like 'Show HN: LLMs can generate valid JSON 100% of the time' [1].<p>If this gave 100% correctness on all test cases always, I'd need to throw out everything I know about LLMs which says they're totally unfit for this sort of purpose, not only due to accuracy, but due to speed, cost, external API dependency, etc, mentioned in other comments.<p>Suggesting that problems with edge cases and text manipulation are good candidates for LLMs seems dangerous. Now your code is nondeterministic (even with temperature set to 0).<p>[1]: <a href="https://news.ycombinator.com/item?id=37125118">https://news.ycombinator.com/item?id=37125118</a>