When AI promises speed but delivers debugging hell

202 点作者 nsavage4 个月前

34 条评论

namaria4 个月前

Coding is trying to order bytes into doing arbitrary stuff that is useful because of some transient conjunction of factors in the real world.We have developed programming languages because coding in machine language is horrible, and over the decades we've refined them into tools people can use fluently and just directly think in code when they have to make a computer system behave in a certain way.Only someone who has never built anything of significant complexity and utility can think that putting natural language encoding between you and the bytes is a net positive.

评论 #42829912 未加载

评论 #42830043 未加载

评论 #42830241 未加载

评论 #42830621 未加载

评论 #42830273 未加载

评论 #42829753 未加载

评论 #42837827 未加载

评论 #42831188 未加载

评论 #42830070 未加载

评论 #42829742 未加载

fcatalan4 个月前

This has been my experience with a recent try to guide the LLM to a complete implementation of a small internal tool. I had in an hour what would have taken me 4 or 5 to write. But after that, it was an endless loop of the LLM adding logging code to find some bug and failing to fix it, only to add more logging code and ineffectual changes and so on. The problem is that even after it's lost at sea, it's still answering in a completely confident and self assured tone, so when you decide to take matters in your hands you might be too far gone from sanity and have an unfixable mess in your hands. I guess I can go back to where it strayed and retake it from there, but by now the experiment seems to be a failure.

评论 #42830046 未加载

评论 #42829990 未加载

评论 #42830041 未加载

评论 #42830036 未加载

评论 #42830006 未加载

alwinaugustin4 个月前

The current state of so-called AI does not provide much meaningful assistance in software development beyond basic tasks such as explaining workflows, breaking down thought processes, and performing simple conversions. I believe that generative AI, in its current form, is not true artificial intelligence. Rather, it is a sophisticated prediction engine that lacks genuine reasoning or understanding.True AI should be capable of comprehending problems and devising its own solutions, rather than merely generating statistically likely outputs. Until AI reaches that level of cognitive ability, its applications in the real world remain limited, and much of what we see today is largely hype.Tokenization and embeddings merely help models predict the most probable next token, a process that is executed at scale using vast computational resources. This is not intelligence but large-scale probabilistic prediction. The terminology used in computer science, especially in recent years, can often be misleading.

评论 #42830459 未加载

评论 #42830363 未加载

CharlieDigital4 个月前

I have a non-technical friend who in the last two months has bootstrapped a SaaS startup using nothing but AI. He's got just over a handful of paying customers at this point on a monthly subscription[0].I asked him to show me his process[1] after trying my hand (20 year, principal) and noticed a big difference in how we used AI: I instruct the AI how to code, he asks the AI to fix problems. In other words, I have a tendency to look at the code and ask the AI to fix it in more specific and direct ways that I want it fixed. On the other hand, if something doesn't work, my friend will copy/paste the error to the AI directly out of the dev tools console and ask the AI to fix the error. The two approaches are totally different.My lesson here is that you're not meant to debug AI generated code; hand the error off to the AI and let it fix itself. I think if you're debugging AI generated code, you're doing AI generated code wrong. If you're an experienced dev picking up AI coding, I think you need to shift your mindset entirely. Ideally, someone out there will just create a closed loop where the AI can fix itself when it finds an error (integrate some browser and autonomous test loop into Cursor, for example, and let it fix its own errors).Conclusion: if you're going to use AI to code, commit to it and use AI to fix the errors as well. Use AI for every aspect of it.[0] Yes, I'm sure there are security holes and code issues galore, but those can always be fixed later when he's proven the business model.[1] Yes, I have told him that he should create a YT channel or stream on Twitch because the content itself is super interesting how well he's been able to use AI.

评论 #42830207 未加载

评论 #42830822 未加载

评论 #42830300 未加载

评论 #42830327 未加载

评论 #42830442 未加载

评论 #42830593 未加载

评论 #42830214 未加载

评论 #42830590 未加载

评论 #42830665 未加载

评论 #42830296 未加载

评论 #42830434 未加载

评论 #42830199 未加载

zug_zug4 个月前

This roughly mirrors my experience so far. Mind you I'm an extremely qualified engineer who has worked at FAANG.Except I'd add that as one gets experience working with the AI I can only assume they'd get much better at making it go smoothly. For example, I wouldn't manually rewrite localhost, I'd tell the AI "Why is localhost everywhere? Will this worker if I deploy to a droplet?" and it will fix it for you.Also I just paste error-messages directly into the AI and it usually knows how to fix them.Sometimes it's net positive, sometimes it's net-negative due to creating a mess that's really hard to get out of or debug. But I imagine it's only a matter of time until the scopes in which it's cost-effective go up.I don't like that AI is a threat of huge monopolistic and job-reducing potential, but I don't think downplaying it is a long-term strategy to combat that.

评论 #42830097 未加载

评论 #42830171 未加载

HL33tibCe74 个月前

This. Great, AI can produce code. But it produces code without inducing understanding of the code in the person who wrote (or rather supervised the production of) it, which is half the point.At some point AI will probably be good enough that this won’t matter. But it feels like we’re still a long way off that.

stuaxo4 个月前

Oh look, a load of future work to fix these.Why is this just like the last cost cutting exercise where the cheapest people in India produced a lot of "interesting" code.

评论 #42829897 未加载

评论 #42829894 未加载

评论 #42829944 未加载

评论 #42830408 未加载

42lux4 个月前

Can anyone explain why everyone is so hyper focused on speed? 500 images per second, 100 minutes video in 30 minutes, thousand lines of loc per hour. Who is going to consume all that?

评论 #42829839 未加载

评论 #42830705 未加载

评论 #42831012 未加载

评论 #42829818 未加载

评论 #42830249 未加载

owenthejumper4 个月前

I have had great experience with Claude for coding, but you really need to be a programmer yourself, to be able to divide the problems into manageable chunks.

评论 #42830115 未加载

cduzz4 个月前

Programs are communication between 2 loosely coupled audiences -- the humans who have to maintain / modify the code and the computer that gets to run the code.Human language, used to convey ideas to other humans, is imprecise. It's fine that it's imprecise because the media (humans) have both good error correction and a reasonable set of global defaults.Computer languages require enormous precision because they're some mechanical translation to a set of machine code runtime.Perhaps you can train an LLM on lots of code, and it'll find semantic relationships between some clever code it's been trained on to and your specific request. Perhaps not, and it'll just give a dumb answer or an incorrect answer, (ideally some code copilot will actually try running the candidate answer code against your specific ask?) -- but once the answer gets complex you run into the "it's much harder to debug code than write it, so don't write code that's almost too complex for you to understand" problem.At work, I constantly have to remind people "don't use math data structures for identities" "but int is smaller" "Are you ever going to want the 95th percentile customerID?" "no that's silly" "then it isn't a number". Or I get to constantly remind people "a string with lots of curly braces and quotes isn't necessarily json; if you're not using a serialized API and just sending bytes to stdout someone else has to parse it" "but I'm using a logging library" "does anything else ever send stuff to stdout while your logging library is running?" "oh yes, we're going to open a ticket to debug that." So I'm not optimistic that running code written by a machine is long-term viable.That said -- there are situations where machine generated code works -- I think it's been a long time since anyone manually drew masks for etching dies when making CPUs.

jbirer4 个月前

Anyone who has ever worked with VCs or shareholders before knows that, if you tell them the reality and limitations of something, they will either fire you or ignore what you say. They have been desperate to remove the leverage programmers have due to their skill and replace us with AI that they don't have to pay salaries to. All we can do at this point is just take VC money promising them exactly what they want to hear, that they will be able to replace us with a NLP model. Sometimes you just can't save people but you can profit from their voluntary fall from the cliff?

keyle4 个月前

If you don't why it works when it works, you won't know why it doesn't work when it doesn't work.The key issues here were staying on top of the AI's help.Use AI wisely: as an assistant, not as a drunken lead developer.

rchowe4 个月前

I played with OpenHands for a few days (using gpt-4o since I already had an OpenAI account). I found it to be decent at writing new code, but then it had a hard time making changes when there was a lot of repetitive code (in a TypeScript / React project that I had it create with vite).One of the interesting things about OpenHands is that you can see what the AI is doing in the terminal window where you launched it. Since it can't really load the whole codebase into its context window, it does a lot of greping files, showing 10 lines on either side of the match, and then doing a search and replace based on this. This is pretty similar to what a human might do: attempt to identify the relevant function and change it.I think I might have better luck with a simpler project, e.g. a Sinatra or Flask app where each route is relatively self-contained. I might give it or Cursor another try in the future when the tech has progressed a bit.

morsecodist4 个月前

I appreciate posts that are about practical usage of AI and it's strengths/weaknesses and the kind of conversation it generates. Conversations about AI are tough for me to navigate because there are camps of people that seem very invested in AI being either omniscient or completely useless. I regularly see people saying that AI is at the level that it can replace engineers or build whole apps. When I try this with state of the art models, I am seeing results that are nowhere close. That said, I still use AI every day during my development and I have a flow I think makes me way more productive. I want more conversations like this about the mechanics of using AI as it currently, and honestly evaluating it's strengths and weaknesses without getting into hypothetical debates about the future or whether or not the AI "understands".

sega_sai4 个月前

It seems there is a battle of two opposite view-points. One is that LLMs are just dumb autocompletes with no ability to understand anything. Another is that LLMs can already right now be substitutes for programmers. I personally thing it is neither, but for experts who know what they are doing it is a massive time saver. I.e. in cases you know what code you want to write, but it's tedious, LLMs can do for you. Also LLMs are great in cases where you are less familiar with a new API, language, but have generally good understanding of programming.Despite my broadly positive view on usefulness of LLMs, I do not think they are good enough (yet) to build a full system from scratch without an expert supervisor. This should not IMO be used as a 'proof' they are dumb autocompleters.

评论 #42830677 未加载

评论 #42830467 未加载

cushychicken4 个月前

One thing I think would have helped the author: write a spec first.Seriously. It seems stupid. But AI works a lot better with a written spec.The incredible thing is that the AI can actually be an excellent resource for writing the spec. And it will actually produce better code when you feed the spec back into said AI!The current generation of AI seems to have fooled a lot of people into thinking that somehow you can jump straight to coding. (Well, you can, and it will probably work if you want to make something small or limited in scope.) Not so!But, on the bright side, it’s just as good at design as code if you ask the right questions!I say this having used 4 and 4o extensively in this manner. Just started using sonnet3.5 in this way in the last month or so, and it is amazing at this.

n_ary4 个月前

The issue with AI is, it generates what it is trained on. Most publicly available coding contents/examples are just docs or blogspam(geeksforgeeks/javapoint/whatever) where mostly surface level code is mostly peddled. Even, many OSS(small scale) do not have best practices or good code base, just enough to get whatever is needed to be done. Now when you train AI on such data, it’ll excel reproducing(statistically) the same thread of code.Once the quality of training data improves(somehow getting access to high quality codebase behind corporate walls by promoting these assistants and ingesting the codebase), the output improves.There was a popular saying, garbage in garbage out.

siva74 个月前

It delivers debugging hell if you don't know what you do which is usually the case for inexperienced developers. It assists experienced developers very well who can sort through which parts of output are useful from the AI and which not so.

ibloomt4 个月前

Heh, after decades of functional programmers being the "well, actually..." crowd at every conference, turns out they were right all along. Just for the wrong reasons!The pitch:AI generates tons of plausible-looking garbage Static types catch garbage at compile time OCaml/F#/Haskell fans quietly sipping tea in the cornerThe irony? We spent years debating static vs dynamic typing for human developers. But the killer use case may ended up being catching AI hallucinations.Finally, a business case for monads that doesn't require a PhD!Time to dust off those Haskell books. Who knew safety could be so profitable? Plot twist: Category theory becomes a required interview question by 2025

peterkelly4 个月前

I dream of a world in which more investment is put into creating better programming languages and runtime environments than trying to use LLMs as a way of coping with the complexities of current systems.

andix4 个月前

My social feeds are full of tech bros who keep telling people AI codes everything for them. AI obviously has some impressive coding skills, but for me it never really worked well.So is this just an illusion they create, or is it really possible to build software with AI, at least at a mediocre level?I'm looking for open source projects that were built mostly with AI, but so far I couldn't find any bigger projects that were built with AI coding tools.

评论 #42829791 未加载

评论 #42829699 未加载

评论 #42829734 未加载

评论 #42829756 未加载

评论 #42829740 未加载

评论 #42829847 未加载

评论 #42829972 未加载

评论 #42830074 未加载

评论 #42829788 未加载

评论 #42829701 未加载

评论 #42829724 未加载

评论 #42829698 未加载

selimnairb4 个月前

I was recently experimenting with local-only LLM coding assistant in JetBrains products. They did speed things up a bit, but I quickly realized that they were essentially automating the creating a copy-paste errors, resulting in time lost to debugging errors I never would have introduced myself, so I stopped using them.

iamflimflam14 个月前

It’s not mentioned anywhere in the post. But would be good to hear what the total time was including all the problems.

SunlitCat4 个月前

Well, still trying to get into nvrhi, I went on to ask ChatGPT to write me an example program using it.To make it short, it got better when I made a project, uploaded the headers and docs of it as project files and moved my chat into that project as well.That said, AI can help you but needs a lot support from you to do things somewhat right.

emporas4 个月前

> LLMs are useless if you don’t understand the context > AI can be worse than useless when you don't understand the underlying technologiesI made a saying about this some weeks ago: "A.I. can make the road for you, but you have to know where you are going". In Greek it sounds a little bit better.Also code is the truth, but it is not the only truth. The underlying computer, the network infrastructure and other things have an effect on the code. So, there could be a saying in addition to the first: "A.I. can make the road for you, but you have to test the road".

评论 #42830179 未加载

senko4 个月前

Content marketing for a new text editor thinly disguised as AI rage-bait.HN fell for it hard - 156 points, 180 comments (as of this writing).Well done Nick! :) And congrats on launching Codescribble! Hope to see a "how my post on AI grew my userbase" followup in a few weeks!

3ptow4 个月前

If you drop all pretenses and use a photocopier to steal code directly instead of performing an elaborate laundering step, you will not have these issues.

yodsanklai4 个月前

Maybe AI will shine when working with strongly typed languages. Most errors can be caught at compile time avoiding debugging hell.

评论 #42830152 未加载

wlindley4 个月前

Garbage in, garbage out. Code spewed by a random generator that has not the slightest understanding of what it is doing, whacked at by a hammer until it seems to be working.What is this supposed to produce other than a mass of bugs and vulnerabilities? "A.I." is utter garbage and always will be, it is foolish to think otherwise.

nowittyusername4 个月前

AI allows for more people to be more productive an therefore code more and produce more lines of code. That alone means more debugging needs to be done. when more people are doing anything, within that realm of action there is more liability naturally simply because of a larger participation in those actions.

joshstrange4 个月前

AI tools are just that, tools. I’ve said this since the very beginning of LLMs. I’ve yet to see anything change my mind. Aider/Devin/Copilot/Cursor/etc, all the different flavors of LLM tools are great but if you don’t know what you’re doing they are going to get stuck in a loop/corner/bad-path. Sometimes it takes 2-6+ exchanges before you realize it’s lost the thread which is why I love Aider’s “auto git commit” feature (defaults to on). You can always jump back X steps if you realize the LLM is lost.You also have to get a good feel for when it’s best if you make a change vs the LLM. Aider doesn’t handle new files and moving around massive chunks super well. It can do it but if I want to rename someone everywhere or break out components/types/etc into different files then I know I should be doing that in my IDE myself. Same for little syntax errors when a diff the LLM makes isn’t quite right.I spent a few nights last week using LLMs to help build a chrome extension to match my Amazon transactions with my YNAB transactions for the purpose of updating the memo field in YNAB with the item names I bought from Amazon to speed up my categorization and serve as history of what I bought (previously I did this whole process manually). I think it really helped and made the whole process go much faster.It really excels (for me) in UI. I’d like to think I’m pretty competent at writing code/logic but I’m not great at UI. In many projects I get bogged down when it comes to UI. If I get stuck coming up with a UI or I don’t like how something looks I can lose motivation to continue forward on it. With Aider I can ask for UI and while it might be abhorrent to a designer I think it looks pretty damn good (better than what I could do) and lets me focus on the logic. Aider also lets me try radical changes knowing I can easy reset back a few steps if it doesn’t work out.I’ve said many times at work that a huge power of LLMs is taking something that would take 30-60min down to <5min, specifically around things like little scripts to investigate a problem or get more details. For example, I might have a log that I can see there is data in that I want to extract. I know I can write a chained/piped command of sed/awk/grep/cut/sort/uniq/etc but it’s going to take some trial and error as well as time. With an LLM I can bang out the full command in 1-3 exchanges.Same deal with visualizing some piece of data in the logs (note: yes, we use Prometheus/Grafana but not everything can go in there and for new bugs/issues in the field I’m normally dealing with something we haven’t seen before and thus haven’t setup monitoring/alerting on). I’ve had LLMs churn out simple HTML/JS/CSS files that I can feed data into “graph all instances of this happening if X > Y and time is between A and B, etc”.Again, I can write this stuff from scratch but often don’t do it in practice because the ROI isn’t guaranteed. In the middle of a production issue do I want to waste 10-30+ min writing the script to see if I can prove a theory? No, it’s not worth it if it doesn’t pan out, but if I’m using an LLM and it takes me less than five minutes then I can throw a lot more stuff at the wall to see if it sticks.

yapyap4 个月前

Never believe the snake oil sellers

macNchz4 个月前

I’ve built and iterated a bunch of web applications with Claude in the past year—I think the author’s experience here was similar to some of my first tries, where I nearly just decided not to bother any further, but I’ve since come to see it as a massive accelerant as I’ve gotten used to the strengths and weaknesses. Quick thoughts on that:1. It’s fun to use it to try unfamiliar languages and frameworks, but that exponentially increases the chance you get firmly stuck in a corner like OP’s deployment issue, where the AI can no longer figure it out and you find yourself needing to learn everything on the fly. I use a Django/Vue/Docker template repo that I’ve deployed many production apps from and know like the back of my hand, and I’m deeply familiar with each of the components of the stack.2. Work in smaller chunks and keep it on a short leash. Agentic editors like Windsurf have a lot of promise but have the potential to make big sweeping messes in one go. I find the manual file context management of Aider to work pretty well. I think through the project structure I want and I ask it to implement it chunk by chunk—one or two moving pieces at a time. I work through it like I would pair programming with someone else at the keyboard: we take it step by step rather than giving a big upfront ask. This is still extremely fast because it’s less prone to big screwups. “Slow is smooth and smooth is fast.”3. Don’t be afraid to undo everything it just did and re-prompt.4. Use guidelines—I have had great success getting the AI to follow my desired patterns, e.g. how and where to make XHRs, by stubbing them in somewhere as an example or explicitly detailing them in a file.5. Suggest the data structures and algorithms you want it to use. Design the software intentionally yourself. Tell it to make a module that does X with three classes that do A, B and C.6. Let the AI do some gold plating: sometimes you gotta get in there and write the code yourself, but having an LLM assistant can help make it much more robust than I’d bother to in a PoC type project—thorough and friendly error handling, nice UI around data validation, extensive tests I’m less worried about maintaining, etc. There are lots of areas where I find myself able to do more and make better quality-oriented things even when I’m coding the core functionality myself.7. Use frameworks and libraries the AI “knows” about. If your goal is speed, using something sufficiently mainstream that it has been trained on lots of examples helps a lot. That said, if something you’re using has had a major API change, you might struggle with it writing 1.0-style code even though you’re using 2.0.8. Mix in other models. I’ve often had Claude back itself into a corner, only to loop in o1 via Aider’s architect mode and have it figure out the issue and tell Claude how to fix it.9. Get a feel for what it’s good at in your domain—since I’m always ready to quickly roll back changes, I always go for the ambitious ask and see whether it can pull it off—sometimes it’s truly amazing in one shot! Other times it’s a mess and I undo it. Either way over time you get an intuition for when it will screw up. Just last week I was playing around with a project where I had a need to draw polygons over a photograph for debugging purposes. A nice to have on top of that was being able to add, delete, and drag to reshape them, but I never would have bothered coding it myself or pulling in a library just for that. I asked Claude for it, and got it in one shot.

fredgrott4 个月前

the real revolution will be when an AI tool can just be powered by our laptop to use our own codebase as the input....Until then its just nonsense pretending to be something else...

评论 #42830125 未加载

评论 #42836682 未加载