TFA is right to point out the bottleneck problem for reviewing content - there’s a couple things that compound to make this worse than it should be -<p>The first is that the LLM outputs are not consistently good or bad - the LLM can put out 9 good MRs before the 10th one has some critical bug or architecture mistake. This means you need to be hypervigilant of everything the LLM produces, and you need to review everything with the kind of care with which you review intern contributions.<p>The second is that the LLMs don’t learn once they’re done training, which means I could spend the rest of my life tutoring Claude and it’ll still make the exact same mistakes, which means I’ll never get a return for that time and hypervigilance like I would with an actual junior engineer.<p>That problem leads to the final problem, which is that you need a senior engineer to vet the LLM’s code, but you don’t get to be a senior engineer without being the kind of junior engineer that the LLMs are replacing - there’s no way up that ladder except to climb it yourself.<p>All of this may change in the next few years or the next iteration, but the systems as they are today are a tantalizing glimpse at an interesting future, not the actual present you can build on.
The intro sentence to this is quite funny.<p>> <i>Remember the first time an autocomplete suggestion nailed exactly what you meant to type?</i><p>I actually don't, because so far this only happened with trivial phrases or text I had already typed in the past. I do remember however dozens of times where autocorrect wrongly "corrected" the last word I typed, changing an easy to spot typo into a much more subtle semantic error.
Excerpted from Tony Hoare's 1980 Turing Award speech, 'The Emperor's Old Clothes'...<p><pre><code> "At last, there breezed into my office the most senior manager of all, a general manager of our parent company, Andrew St. Johnston. I was surprised that he had even heard of me. "You know what went wrong?" he shouted--he always shouted-- "You let your programmers do things which you yourself do not understand." I stared in astonishment. He was obviously out of touch with present day realities. How could one person ever understand the whole of a modern software product like the Elliott 503 Mark II software system? I realized later that he was absolutely right; he had diagnosed the true cause of the problem and he had planted the seed of its later solution."
</code></pre>
My interpretation is that whether shifting from delegation to programmers, or to compilers, or to LLMs, the invariant is that we will always have to understand the consequences of our choices, or suffer the consequences.
> Remember the first time an autocomplete suggestion nailed exactly what you meant to type?<p>No.<p>> Multiply that by a thousand and aim it at every task you once called “work.”<p>If you mean "menial labor" then sure. The "work" I do is not at all aided by LLMs.<p>> but our decision-making tools and rituals remain stuck in the past.<p>That's because LLMs haven't eliminated or even significantly reduced risk. In fact they've created an entirely new category of risk in "hallucinations."<p>> we need to rethink the entire production-to-judgment pipeline.<p>Attempting to do this without accounting for risk or how capital is allocated into processes will lead you into folly.<p>> We must reimagine knowledge work as a high-velocity decision-making operation rather than a creative production process.<p>Then you will invent nothing new or novel and will be relegated to scraping by on the overpriced annotated databases of your direct competitors. The walled garden just raised the stakes. I can't believe people see a future in it.
My observation over the years as a software dev was that velocity is overrated.<p>Mostly because all kinds of systems are made for humans - even if we as a dev team were able to pump out features we got pushed back. Exactly because users had to be trained, users would have to be migrated all kinds of things would have to be documented and accounted for that were tangential to main goals.<p>So bottleneck is a feature not a bug. I can see how we should optimize away documentation and tangential stuff so it would happen automatically but not the main job where it needs more thought anyway.
A few articles like this have hit the front page, and something about them feels really superficial to me, and I'm trying to put my finger on why. Perhaps it's just that it's so myopically focused on day 2 and not on day n. They extrapolate from ways AI can replace humans right now, but lack any calculus which might integrate second or third order effects that such economic changes will incur, and so give the illusion that next year will be business as usual but with AI doing X and humans doing Y.
> AI is scaling the creation side of knowledge work at an exponential rate<p>Why do people keep saying things like this? "Exponential rate"? That's just not true. So far the benefits are marginal at best and limited to relatively simple tasks. It's a truism at this point, even among fans of AI, that the benefits of AI are much more pronounced at junior-level tasks. For complex work, I'm not convinced that AI has "scaled the creation side of knowledge work" at all. I don't think it's particularly useful for the kind of non-trivial tasks that actually take up our time.<p>Amdahl's Law comes into play. If using AI gives you 200% efficiency on trivial tasks, but trivial tasks only take 10% of your time, then you've realized a whopping 5.3% productivity boost. I do not actually spend much time on boilerplate. I spend time debugging half-baked code, i.e. the stuff that LLMs spit out.<p>I realize I'm complaining about the third sentence of the article, but I refuse to keep letting people make claims like this as if they're obviously true. The whole article is based on false premises.
And once the Orient and Decide part is augmented, then we'll be limited by social networks (IRL ones). Every solo founder/small biz will have to compete more and more for marketing eyeballs, and the ones who have access to bigger engines (companies), they'll get the juice they need, and we come back to humans being the bottlenecks again.<p>That is, until we mutually decide on removing our agency from the loop entirely . And then what?
> What I see happening is us not being prepared for how AI transforms the nature of knowledge work and us having a very painful and slow transition into this new era.<p>I would've liked for the author to be a bit specific here. What exactly could this "very painful and slow transition" look like? Any commenters have any idea? I'm genuinely curious.
The article may not be consistent with what I'm hearing from doctors using ambient dictation, which admittedly fits a slightly different niche than the author's use case, but points to their final prediction that the paths to adoption will be complicated.<p>A number of the docs I'm working with describe using ambient dictation as a game changer. Using the OODA loop analogy of the author: they are tightening the full OODA loop by deferring documentation to the end of the day. Historically this was a disaster because they'd forget the first patient by the end of the day. Now, the first patient's automatically dictated note is perhaps wrong but rich with details the spark sufficient remembrance.<p>Of course MBAs will use this to further crush physicians with additional workload, but for a time, it may help.
> This pile of tasks is how I understand what Vaughn Tan refers to as Meaningmaking: the uniquely human ability to make subjective decisions about the relative value of things.<p>Why is that a "uniquely human ability"? Machine learning systems are good at scoring things against some criterion. That's mostly how they work.
What's going to happen is that LLMs will eventually make fewer mistakes, and then people will just put up with more bugs in almost all situations, leading to everything being noticably worse, and build everything with robustness in mind, not correctness. But it will all be <i>cheaper</i> so there you go.
There are so, so many person-years spent studying. And it's not enough? Everyone wanting to work in "knowledge work" does 16 years of schooling, very often more. How inefficient are we that this still apparently isn't enough?
The method of producing the work can be more important (and easier to review) than the work output itself. Like at the simplest level of a global search-replace of a function name that alters 5000 lines. At a complex level, you can trust a team of humans to do something without micro-managing every aspect of their work. My hope is the current crises of reviewing too much AI-generated output will subside into the way you can trust the team because the LLM has reached a high level of “judgement” and competence. But we’re definitely not there yet.<p>And contrary to the article, idea-generation with LLM support can be fun! They must have tested full replacement or something.
I would like to challenge the fundamental premise of the article. Just because you can generate 50 PRs doesn’t mean you should. In fact the same bottleneck they’re describing is present if you have 50 coders in your team.<p>The problem therefore is not how to scale PR review and rather how to select meaningful work to perform, which brings me to the second point made by TFA: humans making judgment calls on whhich PR should be prioritized being a uniquely defining human feature.<p>I beg to differ here as well. All the problems described in the article are high context decisions, you need to take a lot in consideration (user request, product strategy, market dynamics, cost/benefit, rou..) to decide which feature should be prioritized in the next release. What prevents LLMs from being able to help with that is the sheer amount of information to ingest which is a still a limitation despite the long context windows we see nowadays.<p>tl;dr: this is a problem of prioritization and product strategy and nothing specific to AI. Scaling so-called judgment is a red herring and better focus and scope management should be aimed for instead.
The article rightly points out that people don't enjoy just being reviewers: we like to take an active role in playing, learning, and creating. They point out the need to find a solution to this, but then never follow up on that idea.<p>This is perhaps the most fundamental problem. In the past, tools took care of the laborious and tedious work so we could focus on creativity. Now we are letting AI do the creative work and asking humans to become managers and code reviewers. Maybe that's great for some people, but it's not what most problem solvers want to be doing. The same people who know how to judge such things are the same people who have years of experience doing this things. Without that experience you can't have good judgement.<p>Let the AI make it faster and easier for me to create; don't make it replace what I do best and leave me as a manager and code reviewer.<p>The parallels with grocery checkouts are worth considering. Humans are great at recognizing things, handling unexpected situations, and being friendly and personable. People working checkouts are experts at these things.<p>Now replace that with self serve checkouts. Random customers are forced to do this all themselves. They are not experts at this. The checkouts are less efficient because they have to accommodate these non-experts. People have to pack their own bags. And they do all of this while punching buttons on a soulless machine instead of getting some social interaction in.<p>But worse off is the employee who manages these checkouts. Now instead of being social, they are security guards and tech support. They are constantly having to shoot the computer issues and teach disinterested and frustrated beginners how to do something that should be so simple. The employee spends most of their time as a manager and watchdog, looking at a screen that shows the status of all the checkouts, looking for issues, like a prison security guard. This work is inactive and unengaging, requiring constant attention - something humans aren't good at. When little they do interact with others, it is in situations where that are upset.<p>We didn't automate anything here, we just changed who does what. We made customers into the people doing checkouts and we made more level staff into managers of them, plus being tech support.<p>This is what companies are trying to do with AI. They want to have fewer employees whose job it is to manage the AIs, directing them to produce. The human is left assigning tasks and checking the results - managers of thankless and soulless machines. The credit for the creation goes to the machines while the employees are seen as low skilled and replaceable.<p>And we end up back at the start: trying to find high skilled people to perform low skilled work based on experience that they only would have had if they had being doing high skilled work to begin with. When everyone is just managing an AI, no one will know what it is supposed to do.
This really isn’t true in principle. The current LLM ecosystems can’t do “meaning tasks” but there are all kinds of “legacy” AI expert systems that do exactly what is required.<p>My experience is that middle manager gatekeepers are the most reluctant to participate in building knowledge systems that obsolete them though.
AI increases our ability to produce bullshit but doesn't do much to increase our ability to detect bullshit. One sentence of bullshit takes 1000 sentences of clear reasoning to dispel.
Is it just me, or is vibe coding only useful for greenfield projects that have minimal complexity? Seems like they collapse once enough complexity has built up.
> Ultimately, I don’t see AI completely replacing knowledge workers any time soon.<p>How was that conclusion reached? And what is meant by knowledge workers? Any work with knowledge is exactly the domain of LLMs. So, LLMs are indeed knowledge workers.
> He argues this type of value judgement is something AI fundamentally cannot do, as it can only pattern match against existing decisions, not create new frameworks for assigning worth.<p>Counterpoint : That decision has to be made only once (probably by some expert). AI can incorportate that training data into its reasoning and voila, it becomes available to everyone. A software framework is already a collection of good decisions, practices and tastes made by experts.<p>> An MIT study found materials scientists experienced a 44% drop in job satisfaction when AI automated 57% of their “idea-generation” tasks<p>Counterpoint : Now consider making material science decisions which requires materials to have not just 3 properties but 10 or 15.<p>> Redesigning for Decision Velocity<p>Suggestion : I think this section implies we must ask our experts to externalize all their tastes, preferences, top-down thinking so that other juniors can internalize those. So experts will be teaching details (based on their internal model) to LLMs while teaching the model itself to humans.