AI Is Like a Crappy Consultant

95 点作者 gpi1 天前

26 条评论

I think of AI like a Junior Engineer.If I understand the problem well enough, and have a really good description of what I want, like I'm explaining it to a junior engineer, then they do an OK job at it.At my last job, we had a coding "challenge" as part of the interview process, and there was a really good readme that described the problem, the task, and the goal, which we gave the candidate at the start of the session. I copy/pasted that readme into copilot, and it did as good a job as any candidate we'd ever interviewed, and it only took a few minutes.But whenever there are any unknowns or vagaries in the task, or I'm exploring a new concept, I find the AIs to be more of a hindrance. They can sometimes get something working, but not very well, or the code they generate is misleading or takes me down a blind path.The thing for me, though, is I find writing a task for a junior engineer to understand to be harder than just doing the task myself. That's not the point of that exercise, though, since my goal is to onboard and teach the engineer how to do it, so they can accomplish it with less hand-holding in the future, and eventually become a productive member of the team. That temporary increase in my work is worth it for the future.With the AI, though, its not going to learn to be better, I'm not teaching it anything. Every time I want to leverage it, I have to go through the harder tasks of clearly defining the problem and the goal for it.

评论 #43974880 未加载

评论 #43974312 未加载

phillipcarter1 天前

Some of the phenomenon described in this post are felt a lot when using AI.My own anecdote with a codebase I'm familiar with is indeed, as the article mentions, it's a terrible architect. The problem I was solving ultimately called for a different data structure, but it never had that realization, instead trying to fit the problem shape into an existing, suboptimal way to represent the data.When I mentioned that this part of the code was memory-sensitive, it indeed wrote good code! ...for the bad data structure. It even included some nice tests that I decided to keep, including memory benchmarks. But the code was ultimately really bad for the problem.This is related to the sycophancy problem. AI coding assistants bias towards assuming the code they're working with is correct, and that the person using them is also correct. But often neither is ideal! And you can absolutely have a model second-guess your own code and assumptions, but it takes a lot of persistent work because these damn things just want to be "helpful" all the time.I say all of this as a believer in this paradigm and one who uses these tools every day.

评论 #43973077 未加载

评论 #43972856 未加载

评论 #43973326 未加载

评论 #43972914 未加载

realbenpope1 天前

In six months AI has gone from an idiot savant intern to a crappy consultant. I'd call that progress.

评论 #43972783 未加载

评论 #43972677 未加载

评论 #43972811 未加载

biophysboy1 天前

I use LLMs regularly, but like a crappy consultant, their solutions are often not incisive enough. The answer I get is frequently 10x longer than I actually want. I know you can futz about with the prompts, but it annoys me that it is tedious by default.

评论 #43977826 未加载

评论 #43972748 未加载

dgb231 天前

There are tricks one can use to mitigate some of the pitfalls when using either a conversational LLM or a code assistant.They emerge from the simple assumptions that:- LLMs fundamentally pattern match bytes. It's stored bytes + user query = generated bytes.- We have common biases and instinctively use heuristics. And we are aware of some of them. Like confirmation bias or anthropomorphism.Some tricks:1. Ask for alternate solutions or let them reword their answers. Make them generate lists of options.2. When getting an answer that seems right, query for a counterexample or ask it to make the opposite case. This can sometimes help one to remember that we're really just dealing with clever text generation. In other cases it can create tension (I need to research this more deeply or ask an actual expert). Sometimes it will solidify one of the two, answers.3. Write in a consistent and simple style when using code assistants. They are the most productive and reliable when used as super-auto-complete. They only see the bytes, they can't reason about what you're trying to achieve and they certainly can't read your mind.4. Let them summarize previous conversations or a code module from time to time. Correct them and add direction whenever they are "off", either with prompts or by adding comments. They simply needed more bytes to look at to produce the right ones at the end.5. Try to get wrong solutions. Make them fail from time to time, or ask too much of them. This develops a intuition for when these tools work well and when they don't.6. This is the most important and reflected in the article: Never ask them to make decisions, for the simple fact that they can't do it. They are fundamentally about _generating information_. Prompt them to provide information in the form of text and code so you can make the decisions. Always use them with this mindset.

pizzafeelsright1 天前

As a once crappy consultant I would say no.Instant answers, correct or not.Cheaper per answer by magnitudes.Solutions provided with extensive documentation.

评论 #43972731 未加载

bingemaker1 天前

At the moment, I use Windsurf to explain me how a feature is written and how to do 3rd party integrations. I ask for the approach and I write the code myself. Letting AI write the code has become very unproductive over the period of time.I'm still learning though

评论 #43972738 未加载

cafard1 天前

Not that long ago, I noticed <a href="https://ploum.net/2024-12-23-julius-en.html" rel="nofollow">https://ploum.net/2024-12-23-julius-en.html</a> on HN.

_fat_santa1 天前

I think there's something poetic about that fact that you can go on some AI prompt subreddits and have folks there make posts about turning ChatGPT into an "super business consultant" and then go over hear to read about how it's actually pretty bad at that.But back on point, I found AI works best when given a full set of guardrails around what it should do. The other day I put it to work generating copy for my website. Typically it will go off the deep end if you try to make it generate entire paragraphs but for small pieces of text (id say up to 3 sentences) it does surprisingly well and because it's outputting such small amounts of text you can quickly make edits to remove places where it made a bad word choice or didn't describe something quite right.But I would say I only got ChatGPT to do this after uploading 3-4 large documents that outline my product in excruciating detail.As for coding tasks again it works great when given max guardrails. I had several pages that had strings from an object and I wanted those strings to be put back in the code and taken out of the object. This object has ~500 lines in it so it would have taken all day but I ended up doing it in about an hour by having AI do most of the work and just going in after the fact and verifying. This worked really well but I would caution folks that this was a very very specific use case. I've tried vibe coding once for shits and giggles and I got annoyed and stopped after about 10 minutes, IMHO if you're a developer at the "Senior" level, dealing with AI output is more crumbsome than just writing the damn code yourself.

esafak1 天前

I find that if you talk about architecture it can give excellent advice. It can also refactor in accordance with your existing architecture. If you do not bring up architecture I suppose it could use a bad one though I have not had that issue since I always mention the architecture when I ask it to implement a new feature, which is not "vibe coding". But then why should I vibe code?Another conclusion is that we could benefit from benchmarks for architectural quality.

评论 #43972693 未加载

yannyu1 天前

Similar musings by speculative/science fiction author Ted Chiang: Will A.I. Become the New McKinsey? – <a href="https://www.newyorker.com/science/annals-of-artificial-intelligence/will-ai-become-the-new-mckinsey" rel="nofollow">https://www.newyorker.com/science/annals-of-artificial-intel...</a>

benoau1 天前

AI is like a crappy consultant who doesn't care how many times you reject their code and will get it right if you feed them enough information.The amount of time I save just by not having to write tests or jsdocs anymore is amazing. Refactoring is amazing.And that's just the code - I also use AI for video production, 3d model production, generating art and more.

dfxm121 天前

But… then I thought for a bit. And I realized, duh, that’s probably just because I’m not good enough yet to recognize the dumb stuff it’s doing.It's important to have this self awareness. Don't let AI trick you into thinking it can build anything good. When starting a project like in the article, your time is probably better spent taking a step back, learning the finer points of the new language (like, from a book or proper training course) and going from there. Otherwise, you're going to be spending even more time debugging code you don't understand.It's the same thing with a crappy consultant. It seems great to have someone build something for you, but you need to make preparations for when something breaks after their contract is terminated.Overall, it makes you think, what is the point? We can usually find useful crowd-sourced code snippets online, on stack exchange, etc. We have to treat them the same way, but, it's basically free compared to AI, and keeping the crowd-sourced aspect alive makes sure there's always documentation for future devs.

lreeves1 天前

Using Aider with o3 in architect mode, with Gemini or with Sonnet (in that order) is light years ahead of any of the IDE AI integrations. I highly recommend anyone who's interested in AI coding to use Aider with paid models. It is a night and day difference.

评论 #43972936 未加载

评论 #43972926 未加载

评论 #43972960 未加载

bob10291 天前

> I didn’t even know how to think about the changes I needed, because I didn’t understand enoughWhen you hire a team of consultants, it is typically the case that you are doing so because you have an incomplete view of the problem and are expecting them to fill in the gaps for you.The problem arises due to the fact that the human consultants can be made to suffer certain penalties if they don't provide reasonable advice. A transformer model ran in-house cannot experience this. You cannot sue yourself for fucking up your own codebase.

stephc_int131 天前

I treat LLM based coding assistant as a fast and obedient intern, I give it easy but tedious work like unit tests or documentation or duplicating simple stuff, but I double check everything.I also use it as a cognitive assistant, I've always found that talking about a design with a colleague helped me to think more clearly and better organize my ideas, even with very little insights from the other side. In this case the assistant is often a bit lacking on the skepticism side but it does not matter too much.

abadar1 天前

Crappy consultant? That's redundant ;)Seriously, though, within the context of software development, these are all issues I've encountered as well, and I don't know how to program: sweeping solutions, inability to resolve errors, breaking down all components to base levels to isolate problems.But, again, I don't know how to program. For me, any consultant is better than no consultant. And like the author, I've learned a ton on how to ask for what I want out of Cursor.

andy991 天前

A lot of that is down to the training, the "voice" is that of a junior Deloitte consultant writing a report. But this was intentional, as in it was trained to have this voice by virtue of the datasets used in the sft and the rlhf goal function.It would be interesting to see a LLM trained in a completely different way. There's got to be some tradeoff between how coherent the generations are and how interesting they are.

scottfalconer1 天前

A good manager can make a less-than-ideal contributor highly effective with the right guidance and feedback. Applies to AI as well.

ryandvm1 天前

The comparison isn't unfair, but the flip side of that is that a crappy consultant with the right confidence and good soft skills can make several hundred thousand dollars a year in this industry. I'd say AI is on a pretty good career track for itself.Just this morning my CTO was crowing about how he was able to use Claude to modify the UI of one of our internal dev tools. He absolutely cannot wait to start replacing devs with AI.Nobody wanted to hear it back when software development was easy street, but maybe we should have unionized after all...

评论 #43972747 未加载

kelsey9781261 天前

Funny. AI is a reflection of the self. This tells me the author is themselves the same skill level as a crappy consultant in their use of AI. The people getting the most out of AI are the ones who had the highest workload before automation arrived and now find themselves fantastically productive. People like this seem to have too much time on their hands. If you are "just now trying this vibe coding thing" in 2025 that tells me more about you than anything else.

评论 #43973232 未加载

评论 #43972852 未加载

评论 #43972903 未加载

rvz1 天前

This is what it has gotten to.More reports of 'vibe-coding' causing chaos because one trusted what the LLM did and it 'checked' that the code was correct. [0] As always with vibe-coding:Zero tests whatsoever. It's no wonder you see LLMs not being able to understand their own code that they wrote! (LLM cannot reason)Vibe coding is not software engineering.[0] <a href="https://twitter.com/levelsio/status/1921974501257912563" rel="nofollow">https://twitter.com/levelsio/status/1921974501257912563</a>

benhurmarcel1 天前

Honestly I've already had to work with crappier consultants.Also, there's a lot of value already in a crappy but fast and cheap consultant.

micromacrofoot1 天前

I've seen consultants get paid six figures to provide some of the worst advice I've ever heard, so I guess this is progress

taneq1 天前

It's getting better, though, rapidly.

评论 #43972809 未加载

somewhereoutth1 天前

AI is a [bad] tool. Do not anthropomorphize it, anymore than you would a [particularly ineffective and dangerous] hammer.