How much are LLMs boosting real-world programmer productivity?

110 点作者 gatinsama3 个月前

42 条评论

I work at a popular Seattle tech company. and AI is being shoved down our throats by leadership. to the point it was made known they're tracking how much devs use AI and I've even been asked when I'm personally not using it more. and I've long been a believer in using the right tool for the right job. And sometimes it's AI, but not super oftenI spent a lot of time trying to think about how we arrived here. where I work there are a lot of Senior Directors and SVPs who used to write code 10+ years ago. Who if you would ask them to build a little hack project they would have no idea where to start. And AI has given them back something they've lost because they can build something simple super quickly. But they fail to see that just because it accelerates their hack project, it won't accelerate someone who's an expert. i.e. AI might help a hobbyist plant a garden, but it wouldn't help a farmer squeeze out more yield.

评论 #43304316 未加载

评论 #43309176 未加载

评论 #43316645 未加载

评论 #43305821 未加载

mikeocool3 个月前

What’s that old (and in my experience pretty accurate) adage? The last 10% of a software project takes 90% of the time?In my experience, AI is helpful for that first 90% — when the codebase is pretty simple, and all of the weird business logic edge cases haven’t crept in. In the last 10%(as well as most “legacy” codebases), it seems to have a lot trouble understanding enough to generate helpful output at more than a basic level.Furthermore, if you’re not deliberate with your AI usage, it really gets you into “this code is too complicated for the AI to be much help with” territory a lot faster.I’d imagine this is part of why we’re not seeing an explosion of software productivity.

评论 #43305003 未加载

评论 #43304008 未加载

photonthug3 个月前

Like it says in tfa, it’s frustrating how we can never seem to move past anecdotes and “but did you try <insert flavor of the week>” and if you’re lucky, benchmarks that may or may not be scams.10x, 20x etc productivity boosts really should be easy to see. My favorite example of this is the idea of porting popular things like media wiki/wordpress to popular things like Django/rails. Charitable challenge right, since there’s lots of history / examples, and it’s more translation than invention. What about porting large well known code bases from c to rust, etc. Clearly people are interested in such things.There would be a really really obvious uptick in interesting examples like this if impossible dreams were now suddenly weekend projects.If you don’t have an example like this.. well another vibes coding anecdote about another CRUD app or a bash script with tricky awk is just not really what TFA is asking about. That is just evidence that LLMs have finally fixed search, which is great, but not the subject that we’re all the most curious about.

评论 #43337271 未加载

评论 #43319133 未加载

KaiserPro3 个月前

Disclaimer: I work at a FAANG with exceptionally good integration of LLM into my IDE.For me its been a everso slight net positive.In terms of in-IDE productivity it has improved a little bit. Stuff that is mostly repetitive can be autocompleted by the LLM. It can, in some cases provide function names from other files that traditional intelliCode can't do because of codebase size.However it also hallucinates plausible shit, which significantly undermines the productivity gains above.I suspect that if I ask it directly to create a function to do X, it might work better. rather than expecting it to work like autocomplete (even though I comment my code much more than my peers)over all rating: for our code base, its not as good as c# intelliCode/VS code.Where it is good is asking how I do some basic thing in $language that I have forgotten. Anything harder and it start going into bullshit land.I think if you have more comprehensive tests it works better.I have not had much success with agentic workflow, mainly because I've not been using the larger models. (Our internal agentic workflow is limited access)

评论 #43304050 未加载

评论 #43303610 未加载

评论 #43303980 未加载

techpineapple3 个月前

The one example I can think of of real world developer getting seemingly 10x improvement is Pieter levels coding a 3D multiplayer flight sim in a few days vibe coding. I tried vibe coding with cursor and mostly ran into simple roadblock after simple roadblock, I’m curious to watch some unedited videos of people working this way.

评论 #43303057 未加载

评论 #43307668 未加载

EliRivers3 个月前

Every so often, it saves me a few hours on a task that's not very difficult, but that I just don't know how to do already. Generally something generic a lot of people have already done.An example from today was using XAudio2 on windows to output sound, where that sound was already being fetched as interleaved data from a network source. I could have read the docs, found some example code, and bashed it together in a few hours; but I asked one of the LLMs and it gave me some example code tuned to my request, giving me a head start on that.I had to already know a lot of context to be able to ask it the right questions, I suspect, and to thence tune it with a few follow up questions.

评论 #43303095 未加载

评论 #43304020 未加载

评论 #43303117 未加载

avastmick3 个月前

I’m a solo founder/developer (<a href="https://kayshun.co" rel="nofollow">https://kayshun.co</a>) my relationship/usage of LLMs for codegen has been complicated.At first I was all in with Copilot and various similar plugins for neovim. It helped me get going but did produce the worst code in the application. Also I found (personal preference) that the autocomplete function actually slowed me down; it made me pause or even prevented me from seeing what I was doing rather than just typing out what I needed to. I stopped using any codegen for about four months at the end of 2024; I felt it was not making me more productive.This year it’s back on the table with avante[0] and cursor (the latter back off the table due to the huge memory requirements). Then recently Claude Code dropped and I am currently feeling like I have productivity super powers. I’ve set it up in a pair programming style (old XP coder) where I write careful specs (prompts) and tests (which I code); it writes code; I review run the tests and commit. I work with it. I do not just let it just run as I have found I waste more time unwinding its output than watching each step.From being pretty disillusioned six months ago I can now see it as a powerful tool.Can it replace devs? In my opinion, some. Like all things it’s garbage in garbage out. So the idea a non-technical product manager can produce quality outputs seems unlikely to me.0: <a href="https://github.com/yetone/avante.nvim">https://github.com/yetone/avante.nvim</a>

评论 #43303663 未加载

summarity3 个月前

One of my teams at GitHub develops Copilot Autofix, which suggests fixes based on CodeQL alerts (another of my teams’ projects). Based on data of actual devs interacting with alerts and fixes, we see an average 3x speed up in time to fix over no Autofix, and up to 12x for some bug types. There’s more were doing but the theme I’m seeing is that lots of the friction points along the SDLC get accelerated.

malux853 个月前

One of the interesting things about LLM coding assistants is that the quality of the answer is significantly influenced by the communication skill of the programmer.Some of the juniors I mentor cannot formulate their questions clearly and as a result, get a poor answer. They don’t understand that an LLM will answer the question you ask, which might not be the global best solution, it’s just answering your question - and if you ask the question poorly (or worse - the wrong question) you’re going to get bad results.I have seen significant jumps in senior programmers capabilities, in some cases 20x, and when I see a junior or intermediate complaining about how useless LLM coding assistants are it always makes me very suspicious about the person, in that I think the problem is almost certainly their poor communication skills causing them to ask the wrong things.

评论 #43303441 未加载

hirsin3 个月前

> I don't see 5-10x more useful features in the software I use, or 5-10x more software that's useful to me, or that the software I'm using is suddenly working 5-10x better, etc.This has bad assumptions about what higher productivity looks like.Other alternatives include:1. Companies require fewer engineers, so there are layoffs. Software products are cheaper than before because the cost to build and maintain them is reduced.2. Companies require fewer engineers so they lay them off and retain the spend, using it as stock buybacks or exec comp.And certainly it feels like we've seen #2 out in the wild.Assuming that the number of people working on software you use remains constant is not a good assumption.(Personally this has been my finding. I'm able to get a bit more done in my day by eg writing a quick script to do something tedious. But not 5x more)

barnabee3 个月前

It’s extremely circumstantial for me.Sometimes they give me maybe a 5–10% improvement (i.e. nice but not world changing). Usually that’s when they’re working as an alternative to docs, solving the odd bug, helping write tests or occasional glue code, etc. for a bigger or more complex/inportant solution.In other cases I’ve literally built a small functioning app/tool in 6–12 hours of elapsed time, where most of that is spent waiting (all but unattended, so I guess this counts as “vibe coding”) while the LLM does its thing. It’s probably required less than an hour of my time in those cases and would easily have taken at least 1–2 days, if not more for me. So I’d say it’s at least sometimes comfortably 10x.More to the point, in those cases I simply wouldn’t have tried to create the tool, knowing how long it’d take. It’s unclear what the cumulative incremental value of all these new tools and possibilities will be, but that’s also non-zero.

throwawa142233 个月前

I've had terrible luck getting LLMs to make me feel more productive.Copilot is very good at breaking my flow and all of the agent based systems I have tried have been disappointing at following incredibly simple instructions.Coding is much easier and faster than writing instructions in English so it is hard to justify anything i have seen so far as a time saver.

philjohn3 个月前

The biggest boon I've found is writing tests - especially when you've got lots of mocks to setup, takes away that boilerplate overhead and lets you focus on the meat of the test.And when you name your test cases in a common pattern such as "MethodName_ExpectedBehavior_StateUnderTest" the LLM is able to figure it out about 80% of the time.Then the other 20% of the time I'll make a couple of corrections, but it's definitely sped me up by a low double digit percentage ... when writing tests.When writing code, it seems to get in the way more often than not, so I mostly don't use it - but then again, a lot of what I'm doing isn't boilerplate CRUD code.

havaloc3 个月前

I write plain jane PHP/MySQL crud apps that people love for work, including a fitness center membership system.Writing a new view used to take 5-10 minutes but now I can do it in 30 seconds. Since it's the most basic PHP/MySql imaginable it works very well, none of those frameworks to confuse the LLM or suck up the context window.The point is I guess that I can do it the old fashioned way because I know how, but I don't have to, I can tell ChatGPT exactly what I want, and how I want it.

评论 #43304342 未加载

lfsh3 个月前

As search engine LLMs are nice. But for code generation they are not. Everytime it generates code there are small bugs that I don't notice directly but will bite me later.For example a peace of code with a foreach loop that uses the collection name inside the loop instead of the item name.Or a very nice looking peace of code but with a method call that does not exist in the used library.I think the weakness of AI/LMMs is that it outputs probabilities. If the code you request is very common than it will probably generate good code. But that's about it. It can not reason about code (it maybe can 'reason' about the probability of the generated answer).

dvh3 个月前

I stopped using stack overflow altogether. It require to write very careful question not to get removed, with LLM I write 1 sentence, then few to narrow it down as needed. It could easily be 1 minute llm vs. writing SO post for 20 minutes and waiting 30 minutes for response. It also saves googling time because googling query often must be more generic to be effective so I then have to spend more time adjusting found solution, llm often gives specific answer.The moment I realized llm are better was when I needed to do something with screen coordinates of point clouds in three.js and my searches lead nowhere, doing it myself would take me 1 or 2 hours, the llm got correct working code on first try.

评论 #43303136 未加载

floppiplopp3 个月前

I use the jetbrains product line for my professional work. They now come with a AI code completion assistant, which will 50-80% of the time, depending on the type of project, suggest something wrong, which I either have to spend energy to evaluate and then ignore. The rare cases where it does suggest something useful don't make up for the time an energy wasted having to deal with the completion. AI in this case is detrimental to productivity and attention to the code. It's more useless than useful.

patrick4513 个月前

I never use them to generate code for languages that I know well. Google has started trying to answer my programming related searches with LLM results. Half the time it points me a useful direction, the other half it's just dead wrong.I have found them pretty helpful writing sql. But I don't really know sql very well and I'd imagine that somebody who does could write what I need in far less time that it takes me with the LLM. While the LLM helps finish my sql task faster, the downside is that I'm not really learning it in the same way I would if I had to actually bang my head against the wall and understand the docs. In the long run, I'd be better off without it.

arjie3 个月前

Well, the most concrete example of this is Pieter Levels and his new flying around in cyberspace video game where he's making $60k MRR on. It's a concrete thing that he wouldn't have been able to build otherwise on that cadence.

LunaSea3 个月前

Negative productivity for me as I have to review bad LLM code during PR reviews.

评论 #43304900 未加载

arthurofbabylon3 个月前

Traction – such a useful term. What is the "traction" of LLMs in programming? When the rubber hits the road, what happens? Do the wheel spin in place, or does the car move forward?The nice thing about traction is that you can see it. When you shovel away the snow in your driveway you can move your car; that's nice. When you update your hot water kettle and it boils in 30 seconds, that's traction. Traction is a freshly installed dishwasher in your kitchen.I sincerely ask – not because I am skeptical but because I am curious – where is the traction with LLMs in software?

评论 #43303852 未加载

nurettin3 个月前

I work as a programmer for the financial industry. (We do integrations with brokers, data aggregation, near realtime execution for mid frequency trading). I don't integrate them with my text editor. Pretty much all LLM code is sub-par no matter how advanced the model (likes to sprinkle hash maps everywhere a simple vector is fine, makes up functions even if you show headers) BUT sometimes it motivates me like a clown, or a rubber duck. Or a therapy cat. So overall it feels like a 5% increase in productivity for me.

insane_dreamer3 个月前

I find LLMs useful in the same vein that I find StackOverflow useful or reading through documentation useful. It saves me time when I 1) run into edge cases (where I might search SO and get lucky, but just as often the LLM's suggestion won't work); 2) am working with a framework that I'm not super familiar with (faster than looking up in the documentation); 3) writing code comments; 4) tedious refactoring.It boosts productivity in the way that a good IDE boost productivity, but nothing like 5x or even 2x. Maybe 1.2~1.4x.

sumoboy3 个月前

Like a new diet every week, not everyone is losing weight. Great for snippets, suggestions, and helping with errors. But a long ways to go before it's more consistent and commonly used.

jmchuster3 个月前

I see it analogous to asking, "How much is access to the internet boosting real-world programmer productivity?" Are you really 5-10x more productive being able to google something? Couldn't you have just looked it up in the manual, don't you have peers you can ask, that's such a small portion of the time you spend coding.But we've now lived it so much that it sounds ridiculous to try to argue that the internet doesn't really make _that_ much of a difference.

评论 #43303720 未加载

rqtwteye3 个月前

For me it works pretty well for python scripts that pull data from somewhere and do something with it. Even the code quality is quite ok often. But for larger complex projects it doesn't work for me. It produces a lot of unmaintainable code.But I can easily see a not so distant future where you don't even have to look at the code anymore and just let AI do its thing. Similar to us not checking the assembly instructions of compiled code.

评论 #43303151 未加载

评论 #43303176 未加载

jasonthorsness3 个月前

It has certainly boosted my productivity for my blog. Not the articles; but the interactive features. Since my day job is mostly Go and C++ I’m less familiar with NextJS/TypeScript/NodeJS and the LLMs are tremendously helpful in knowing which libraries to use and how to use them correctly. It lets me get way more done in the constrained time I have to spend on it.

bhouston3 个月前

I am experiencing massive productivity boost with my home grown open source agentic coder, especially since I added a GitHub mode to it this week. I write about that here: <a href="https://benhouston3d.com/blog/github-mode-for-agentic-coding" rel="nofollow">https://benhouston3d.com/blog/github-mode-for-agentic-coding</a>

furstenheim3 个月前

I find it useful to find esoteric APIs, like reflection, which normally has 100 methods and hard to find the right one

mentalgear3 个月前

I found that AI might help with analysis, but the generation part severely lacks and overall results in just too much effort to clean up and check the code afterwards then just writing it yourself.At least for mid- to high- complex projects.Vibe coding might be fun but ultimately results in unmaintainable code.

yimby20013 个月前

You guys really have all the FFMpeg flags memorized? It’s absurd to say it doesn’t save time sometimes. Also, aren’t these the guys that were whingeing about how we need to turn it off before the world turns into paper clips three years ago?

herbst3 个月前

I use AI all the day, gardening is now a completely different story for example. But for coding I don't get the results I want (which is mostly that the code is working)

asdf69693 个月前

I use it almost daily for asking questions. It’s 100x faster at reading documentation than I am but it’s still pretty bad at actually writing code.

janwillemb3 个月前

I was in a school team where we had to create documents describing a new curriculum. One of the members thought creating these docs using a LLM was a very easy and productive way of doing it. But it absolutely wasn't. It looked alright at the start, but on closer inspection it was just a load of BS. We had to change literally every single sentence to let the text make sense. I also was in a team that created tools we could use for our students. Again one of the members used a LLM to create certain tools. At first glance they worked alright, until some change was needed. Refactoring wasn't possible - a total rewrite was the only thing that helped. I don't think LLM boosts productivity. At best it boosts the appearance of productivity of one person, and then the others have to clean up after her.

bufordtwain3 个月前

I'd guess maybe 10-20% improvement for me.

croes3 个月前

Shouldn't the profits of companies that use AI increase massively?Seems to be the easiest measurement of any effect

zellyn3 个月前

At my workplace, Block, people are sinking a lot of effort into Goose: <a href="https://block.github.io/goose/" rel="nofollow">https://block.github.io/goose/</a>Most of the folks I've talked to about it have been trying it, but the majority of the stories are still ultimately failures.There are exceptions though: there's been some success for porting things between say JUnit4 and JUnit5.The successes do seem to be coming more frequently, as the models improve, as the tools improve, as people develop the intuition, and as we invest time and attention to building out LLM-tailored documentation (my prediction here is that the task-focused, bite-sized documentation style that seems like a fit for LLMs will ultimately prove to be more useful to developers than a lot of the existing docs!)On the part of the models improving, I expect it's going to be a bit like the ChatGPT 3.5 to 4 transition: there are certain almost intangible thresholds that when crossed can suddenly make a qualitative difference in usability and success.I definitely feel like my emotions regarding LLMs are a bit of a roller coaster. I'm turning 50 these days, and some days feel like I would rather not completely upend my development practices! And the hype -- oh god, the hype -- is absolutely nauseating. Every CTO in existence told their teams to go rub some AI on everything. Every ad is telling you their AI is already working perfectly (it isn't).But then I compete in our internal CTF and come in third even though the rest of my team bailed, because ChatGPT can write RISCV assembly payloads, and I don't have to learn or re-learns it for half an hour. Or I get Claude to write a Javascript/SVG spline editor matching the diagramming-as-code system I'm using, in like 30 or 45 minutes. And it's miraculous. And things like Cursor for just writing your code when you already know what you want… magical.Here's the thing though. Despite the nauseating hype, we have to keep trying and trying to use AI well. There's a there there. The things are obviously unbelievably powerful. At some point, we're going to figure out how to use them effectively, and they're going to get good enough to do most of our low- to medium-complexity coding (at the very least). We're going to have to re-architect our software around AI. (As an example, instead of kludgy multi-platform solutions like React Native or Kotlin Multiplatform or J2ObjC, etc., why not make all the tests textual, and have an LLM translate changes in your Kotlin Android codebase into Swift automatically?)We do still need sanity. I'm sort of half tongue-in-cheek trying to promulgate a rule that nobody can invoke AI to discount costs of expensive migrations until they've demonstrated an automated migration of the type in question with a medium-complexity codebase. We have to avoid waving our hands and saying, "Don't worry about that; AI will handle it," when it can't actually handle it yet.But keep trying!

smusamashah3 个月前

When I think about LLMs reaching their peak at writing code, I can't help but think they will be writing hyper optimized code that will squeeze every last bit of processing power available to them.I use these tools to get help here and there with tiny code snippets. So far I have not been suggested anything finely optimised. I guess it's because a greater chunk they were trained on isn't optimised for performance.Does anyone know if any current LLMs can generate super optimised code (even assembly language) ? I don't think so. Doesn't feel we are going to have more intelligent machines than us in future if they full of slop.

评论 #43306964 未加载

dehrmann3 个月前

> Abstract indicators to the tune of "this analysis shows Y% more code has been produced in the last quarter". (This can just indicate AI producing code slop/bloat).I suspect the metrics you sometimes hear like "x% of new code was written by an LLM" are being oversold because they're reported by people interested in juicing the numbers, so they count boilerplate, lines IDE autocomplete would have figured out, and lines that had to be fixed.

semanticjudo3 个月前

The author has implied a false dichotomy: positioning the article as “it does 10x or it does nothing” (my paraphrasing) is disingenuous and hyperbolic. My experience is that on several tasks professional devs, including myself, can get to an answer much faster than pre-LLM. For example, I’ve never had to use SQL frequently enough to become an expert. Prior to LLMs, creating queries beyond the basic would take an hour of Googling and keyboard head banging (or find an expert to help who is invariably doing their own job). Now, the same thing takes 6 minutes. Arguably 10x faster for this task. But since I don’t do this often nor have 40 other examples like this, I’d never claim it makes me 10x more productive. But I DO run into 5 or 6 of this and similar examples a week and several others of smaller magnitude. And that has a meaningful impact on my productivity. I could go on to describe in what ways I can see this productivity improvement but the primary point is that it is not all or nothing. An LLM might make me 20% more productive across my week and that is still a big deal when compared with just not having it.

评论 #43303416 未加载

ilrwbwrkhv3 个月前

For me the biggest boost has been for front-end code. I have never considered front-end development, real development and the whole madness of next.js components which somebody wants to write or add and all of the CSS animations and designs and some of the tailwind class madness I let AI handle.That stuff is already a mess so the AI slop that comes out is also messy and that's fine as long as it looks good and performs well. and does what I want and it's also really trivial to change.However, I'm not letting it come near any backend code or actual development.

评论 #43304014 未加载

simonswords823 个月前

Without over complicating it, using AI is currently about 4x productivity in mid to senior developers.

评论 #43303621 未加载