Saw this today https://news.ycombinator.com/item?id=42575951 and thought that there might be more such cautionary tales. Please share your LLM horror stories for all of us to learn.
Guys, it's a major 21st century skill to learn how to use LLMs. In fact, it's probably the biggest skill anyone can develop today. So please be a responsible driver, learn how to use LLMs.<p>Here's one way to get the most mileage out of them:<p>1) Track the best and brightest LLMs via leaderboards (e.g. <a href="https://lmarena.ai/" rel="nofollow">https://lmarena.ai/</a>, <a href="https://livebench.ai/#/" rel="nofollow">https://livebench.ai/#/</a> ...). Don't use any s**t LLMs.<p>2) Make it a habit to feed in whole documents and ask questions about them vs asking them to retrieve from memory.<p>3) Ask the same question to the top ~3 LLMs in parallel (e.g. top of line Gemini, OpenAI and Claude models)<p>4) Do comparisons between results. Pick best. Iterate on the the prompt, question and inputs as required.<p>5) Validate any key factual information via Google or another search engine before accepting it as a fact.<p>I'm literally paying for all three top AIs. It's been working great for my compute and information needs. Even if one hallucinates, it's rare that all three hallucinate the same thing at the same time. The quality has been fantastic, and intelligence multiplication is supreme.
Perhaps the story that doesn't get told more often is how LLMs are changing how humans operate en masse.<p>When ChatGPT came out, I was increasingly outsourcing my thinking to LMS. It took me a few months to figure out that that's actually harming me - I've lost my ability to think through things a little bit.<p>The same is true for Coding Assistants; sometimes I disable the in-editor coding suggestions, when I find that my coding has atrophied.<p>I don't think this is necessarily a bad thing, as long as LMs are ubiquitous and they proliferate throughout society and are extremely reliable and accessible. But they are not there today.
The linked post is more a story of someone not understanding what they're deploying. If they had found a random blog post about spot instances, they likely would have made the same mistake.<p>In this case, the LLM suggested a potentially reasonable approach and the author screwed themselves by not looking into what they were trading off for lower costs.
I'm surprised at how even some of the smartest people in my life take the output of LLMs at face value. LLMs are great for "plan a 5 year old's birthday party, dinosaur theme", "design a work-out routine to give me a big butt", or even rubber-ducking through a problem.<p>But for anything where the numbers, dates, and facts matter, why even bother?
Not me but Craig Wright aka Faketoshi referenced court cases hallucinated by a LLM in his appeal.<p><a href="https://cointelegraph.com/news/court-rejects-craig-wright-appeal-bitcoin-creator-case" rel="nofollow">https://cointelegraph.com/news/court-rejects-craig-wright-ap...</a>
ChatGPT claims our service has a feature which we don't have (for example tracking people based on their phone number). Users register a free account, then complain to us. The first email is often vague "It doesn't work" without details. Slightly worse is users who go ahead and make a purchase, then complain, then demand a refund. We had to add a warning on the account registration page.
I'm currently shopping for a new car, and while asking questions at a dealer (not Tesla), they revealed that the sales guys use ChatGPT to look up information about the car because it's quicker than trying to find things in their own database.<p>I did not buy that car.
Not mine but a client of mine. Consultants sold them a tool that didn't exist because the LLM hallucinated and told their salesperson it did. Not sure that's really the LLM's fault, but pretty funny.
I've tried LLMs for a few exploratory programming projects.
It kinda feels magical the first time you import a dependency you don't know and you get the LLM output what you want to do without you even having the time to think about it.
However, I also think that for any minute I've gained with it I've lost at least one because of hallucinated solutions.<p>Even for fairly popular things (Terraform+AWS) I continuously got plausible-looking answers. After reading carefully the docs, the use case was not supported at all, so I just went with the 30 seconds (inefficient) solution I had thought of from the start. But I lost more than one hour.<p>Same story with the Ren'py framework. The issue is that the docs are far from covering everything, and Google sucks, sometimes giving you a decade-old answer to a problem that has a fairly good answer in more recent versions. So it's really difficult to decide how to most efficiently look for an answer between search and LLM. Both can be a stupid waste of time.
I find it interesting how LLM errors can be so subtle. The next-token prediction method rewards superficial plausibility, so mistakes can be hard to catch.
Give me a real example of something in computer science they can't do? I'm interested since Chatgpt is better than any professor I've had at any level in my educational career.
LLMs = ad-free version of google<p>That's why people adopted it. Google got worse and worse, now the gap is filled with LLMs.<p>LLMs have replaced google, and that's awesome. LLMs won't cook lunch or fold our laundry, and until a better technology comes around which can actually do that all promises around "AI" should be seen as grifting.