A bear case: My predictions regarding AI progress

204 点作者 suryao2 个月前

32 条评论

csomar2 个月前

> LLMs still seem as terrible at this as they'd been in the GPT-3.5 age. Software agents break down once the codebase becomes complex enough, game-playing agents get stuck in loops out of which they break out only by accident, etc.This has been my observation. I got into Github Copilot as early as it launched back when GPT-3 was the model. By that time (late 2021) copilot can already write tests for my Rust functions, and simple documentation. This was revolutionary. We didn't have another similar moment since then.The Github copilot vim plugin is always on. As you keep typing, it keeps suggesting in faded text the rest of the context. Because it is always on, I kind of can read into the AI "mind". The more I coded, the more I realized it's just search with structured results. The results got better with 3.5/4 but after that only slightly and sometimes not quite (ie: 4o or o1).I don't care what anyone says, as yesterday I made a comment that truth has essentially died: <a href="https://news.ycombinator.com/item?id=43308513">https://news.ycombinator.com/item?id=43308513</a> If you have a revolutionary intelligence product, why is it not working for me?

评论 #43321451 未加载

评论 #43317808 未加载

评论 #43319399 未加载

评论 #43330089 未加载

评论 #43341779 未加载

评论 #43317541 未加载

评论 #43326458 未加载

stego-tech2 个月前

> At some point there might be massive layoffs due to ostensibly competent AI labor coming onto the scene, perhaps because OpenAI will start heavily propagandizing that these mass layoffs must happen. It will be an overreaction/mistake. The companies that act on that will crash and burn, and will be outcompeted by companies that didn't do the stupid.We're already seeing this with tech doing RIFs and not backfilling domestically for developer roles (the whole, "we're not hiring devs in 202X" schtick), though the not-so-quiet secret is that a lot of those roles just got sent overseas to save on labor costs. The word from my developer friends is that they are sick and tired of having to force a (often junior/outsourced) colleague to explain their PR or code, only to be told "it works" and for management to overrule their concerns; this is embedding AI slopcode into products, which I'm sure won't have any lasting consequences.My bet is that software devs who've been keeping up with their skills will have another year or two of tough times, then back into a cushy Aeron chair with a sparkling new laptop to do what they do best: write readable, functional, maintainable code, albeit in more targeted ways since - and I hate to be that dinosaur - LLMs produce passable code, provided a competent human is there to smooth out its rougher edges and rewrite it to suit the codebase and style guidelines (if any).

评论 #43317173 未加载

评论 #43322671 未加载

评论 #43321886 未加载

mark_l_watson2 个月前

I have used neural networks for engineering problems since the 1980s. I say this as context for my opinion: I cringe at most applications of LLMs that attempt mostly autonomous behavior, but I love using LLMs as ‘side kicks’ as I work. If I have a bug in my code, I will add a few printout statements where I think my misunderstanding of my code is, show an LLM my code and output, explain the error: I very often get useful feedback.I also like practical tools like NotebookLM where I can pose some questions, upload PDFs, and get a summary based in what my questions.My point is: my brain and experience are often augmented in efficient ways by LLMs.So far I have addressed practical aspects of LLMs. I am retired so I can spend time on non practical things: currently I am trying to learn how to effectively use code generated by gemini 2.0 flash at runtime; the gemini SDK supports this fairly well so I am just trying to understand what is possible (before this I spent two months experimenting with writing my own tools/functions in Common Lisp and Python.)I “wasted” close to two decades of my professional life on old fashioned symbolic AI (but I was well paid for the work) but I am interested in probabilistic approaches, such as in a book I bought yesterday “Causal AI” that was just published.Lastly, I think some of the recent open source implementations of new ideas from China are worth carefully studying.

评论 #43322720 未加载

cglace2 个月前

The thing I can't wrap my head around is that I work on extremely complex AI agents every day and I know how far they are from actually replacing anyone. But then I step away from my work and I'm constantly bombarded with “agents will replace us”.I wasted a few days trying to incorporate aider and other tools into my workflow. I had a simple screen I was working on for configuring an AI Agent. I gave screenshots of the expected output. Gave a detailed description of how it should work. Hours later I was trying to tweak the code it came up with. I scrapped everything and did it all myself in an hour.I just don't know what to believe.

评论 #43321769 未加载

评论 #43317729 未加载

评论 #43321811 未加载

评论 #43317725 未加载

评论 #43317741 未加载

colonCapitalDee2 个月前

Yeah, I'd buy it. I've been using Claude pretty intensively as a coding assistant for the last couple months, and the limitations are obvious. When the path of least resistance happens to be a good solution, Claude excels. When the best solution is off the beaten track, Claude struggles. When all the good solutions lay off the beaten track, Claude falls flat on its face.Talking with Claude about design feels like talking with that one coworker who's familiar with every trendy library and framework. Claude knows the general sentiment around each library and has gone through the quickstart, but when you start asking detailed technical questions Claude just nods along. I wouldn't bet money on it, but my gut feeling is that LLMs aren't going to be a straight or even curved shot to AGI. We're going to see plenty more development in LLMs, but it'll be just be that. Better LLMs that remain LLMs. There will be areas where progress is fast and we'll be able to get very high intelligence in certain situations, but there will also be many areas where progress is slow, and the slow areas will cripple the ability of LLMs to reach AGI. I think there's something fundamentally missing, and finding what that "something" is is going to take us decades.

评论 #43317770 未加载

评论 #43318125 未加载

评论 #43317951 未加载

usaar3332 个月前

Author also made a highly upvoted and controversial comment about o3 in the same vein that's worth reading: <a href="https://www.lesswrong.com/posts/Ao4enANjWNsYiSFqc/o3?commentId=vTAnsqsok7HxtgHok" rel="nofollow">https://www.lesswrong.com/posts/Ao4enANjWNsYiSFqc/o3?comment...</a>Oh course lesswrong, being heavily AI doomers, may be slightly biased against near term AGI just from motivated reasoning.Gotta love this part of the post no one has yet addressed:> At some unknown point – probably in 2030s, possibly tomorrow (but likely not tomorrow) – someone will figure out a different approach to AI. Maybe a slight tweak to the LLM architecture, maybe a completely novel neurosymbolic approach. Maybe it will happen in a major AGI lab, maybe in some new startup. By default, everyone will die in <1 year after that

评论 #43322098 未加载

评论 #43327165 未加载

评论 #43317664 未加载

评论 #43321511 未加载

spaceman_20202 个月前

The impression I get from using all cutting edge AI tools:1. Sonnet 3.7 is a mid-level web developer at least2. DeepResearch is about as good an analyst as an MBA from a school ranked 50+ nationally. Not lower than that. EY, not McKinsey3. Grok 3/GPT-4.5 are good enough as $0.05/word article writersIts not replacing the A-players but its good enough to replace B players and definitely better than C and D players

评论 #43317859 未加载

评论 #43317747 未加载

a-dub2 个月前

> LLMs are not good in some domains and bad in others. Rather, they are incredibly good at some specific tasks and bad at other tasks. Even if both tasks are in the same domain, even if tasks A and B are very similar, even if any human that can do A will be able to do B.i think this is true of ai/ml systems in general. we tend to anthropomorphise their capability curves to match the cumulative nature of human capabilities, where often times the capability curve of the machine is discontinuous and has surprising gaps.

orangebread2 个月前

I think the author provides an interesting perspective to the AI hype, however, I think he is really downplaying the effectiveness of what you can do with the current models we have.If you've been using LLMs effectively to build agents or AI-driven workflows you understand the true power of what these models can do. So in some ways the author is being a little selective with his confirmation bias.I promise you that if you do your due diligence in exploring the horizon of what LLMs can do you will understand what I'm saying. If ya'll want a more detailed post I can get into the AI systems I have been building. Don't sleep on AI.

评论 #43322316 未加载

评论 #43322729 未加载

andsoitis2 个月前

This poetic statement by the author sums it up for me:”People are extending LLMs a hand, hoping to pull them up to our level. But there's nothing reaching back.”

评论 #43317711 未加载

swazzy2 个月前

I see no reason to believe the extraordinary progress we've seen recently will stop or even slow down. Personally, I've benefited so much from AI that it feels almost alien to hear people downplaying it. Given the excitement in the field and the sheer number of talented individuals actively pushing it forward, I'm quite optimistic that progress will continue, if not accelerate.

评论 #43321504 未加载

评论 #43317992 未加载

dartharva2 个月前

评论 #43317757 未加载

tibbar2 个月前

LLMs make it very easy to cheat, both academically and professionally. What this looks like in the workplace is a junior engineer not understanding their task or how to do it but stuffing everything into the LLM until lint passes. This breaks the trust model: there are many requirements that are a little hard to verify than an LLM might miss, and the junior engineer can now represent to you that they "did what you ask" without really certifying the work output. I believe that this kind of professional cheating is just as widespread as academic cheating, which is an epidemic.What we really need is people who can certify that a task was done correctly, who can use LLMs as an aid. LLMs simply cannot be responsible for complex requirements. There is no way to hold them accountable.

HenryBemis2 个月前

My predictions on the matter:<pre><code> LLMs are already super useful. It does all my coding and scripting for me @home It does most of the coding and scripting at the workplace It creates 'fairly good' checklists for work (not perfect, but it takes a 4 hour effort and makes it 25mins - but the "Pro" is still needed to make this or that checklist usable - I call this a win)(need both the tech AND the human) If/when you train an 'in-house' LLM it can make some easy wins (on mega-big-companies with 100k staff they can get quick answers on "which Policy writes about XYZ, which department can I talk to about ABC, etc.) We won't have the "AGI"/Skynet anytime soon, and when one will exist the company (let's use OpenAI for example) will split in two. Half will give LLMs for the masses at $100 per month, the "Skynet" will go to the DOD and we will never hear about it again, except in the Joe Rogan podcast as a rumor. It is a great 'idea generator' (search engine and results aggregator): give me a list of 10 things I can do _that_ weekend in _city_I_will_be_traveling_to so if/when I go to (e.g. London): here are the cool concerts, theatrical performances, parks, blah blah blah</code></pre>

JTbane2 个月前

Anyone else feel like AI is a trap for developers? I feel like I'm alone in the opinion it decreases competence. I guess I'm a mid-level dev (5 YOE at one company) and I tend to avoid it.

评论 #43331159 未加载

roenxi2 个月前

This seems to be ignoring the major force driving AI right now - hardware improvements. We've barely seen a new hardware generation since ChatGPT was released to the market, we'd certainly expect it to plateau fairly quickly on fixed hardware. My personal experience of AI models is going to be a series of step changes every time the VRAM on my graphics card doubles. Big companies are probably going to see something similar each time a new more powerful product hits the data centre. The algorithms here aren't all that impressive compared to the creeping FLOPS/$ metric.Bear cases always welcome. This wouldn't be the first time in computing history that progress just falls off the exponential curve suddenly. Although I would bet money on there being a few years left and AGI is achieved.

评论 #43317876 未加载

评论 #43320462 未加载

audessuscest2 个月前

> It seems to me that "vibe checks" for how smart a model feels are easily gameable by making it have a better personality.I don't buy that at all, most of my use cases don't involve model's personality, if anything I usually instruct to skip any commentary and give the result excepted only. I'm sure most people using AI models seriously would agree.> My guess is that it's most of the reason Sonnet 3.5.1 was so beloved. Its personality was made much more appealing, compared to e. g. OpenAI's corporate drones.I would actually guess it's mostly because it was good at code, which doesn't involve much personnality

Imnimo2 个月前

>Test-time compute/RL on LLMs: >It will not meaningfully generalize beyond domains with easy verification.To me, this is the biggest question mark. If you could get good generalized "thinking" from just training on math/code problems with verifiers, that would be a huge deal. So far, generalization seems to be limited. Is this because of a fundamental limitation, or because the post-training sets are currently too small (or otherwise deficient in some way) to induce good thinking patterns? If the latter, is that fixable?

评论 #43322773 未加载

bloomingkales2 个月前

Let's imagine that we all had a trillion dollars. Then we would all sit around and go "well dang, we have everything, what should we do?". I think you'll find that just about everyone would agree, "we oughta see how far that LLM thing can go". We could be in nuclear fallout shelters for decades, and I think you'll still see us trying to push the LLM thing underground, through duress. We dream of this, so the bear case is wrong in spirit. There's no bear case when the spirit of the thing is that strong.

评论 #43317621 未加载

评论 #43321507 未加载

评论 #43317365 未加载

mcintyre19942 个月前

> Scaling CoTs to e. g. millions of tokens or effective-indefinite-size context windows (if that even works) may or may not lead to math being solved. I expect it won't.> (If math is solved, though, I don't know how to estimate the consequences, and it might invalidate the rest of my predictions.)What does it mean for math to be solved in this context? Is it the idea that an AI will be able to generate any mathematical proof? To take a silly example, would we get a proof of whether P=NP from an AI that had solved math?

评论 #43322102 未加载

klik992 个月前

This almost exactly what I’ve been saying while everyone was saying we’re on the path to AGI in the next couple of years. We’re an innovation / tweak / or paradigm shift away from AGI. His estimate in the 2030s that could happen is possible but optimistic- you can’t time new techniques, you can only time progress on iterative progress.This is all the standard timeline for new technology - we enter the diminishing returns period, investment slows down a year or so afterwards, layoffs, contraction of industry, but when the hype dies down the real utilitarian part of the cycle begins. We start seeing it get integrated into the use cases it actually fits well with and by five years time its standard practice.This is a normal process for any useful technology (notably crypto never found sustainable use cases so it’s kind of the exception, it’s in superposition of lingering hype and complete dismissal), so none of this should be a surprise to anyone. It’s funny that I’ve been saying this for so long that I’ve been pegged an AI skeptic, but in a couple of years when everyone is burnt out on AI hype it’ll sound like a positive view. The truth is, hype serves a purpose for new technology, since it kicks off a wide search for every crazy use case, most of which won’t work. But the places where it does work will stick around

worik2 个月前

> It blows Google out of the water at being GoogleThat is enough for me.

评论 #43317542 未加载

readthenotes12 个月前

LLMs seem less hyped than block chains were back in the day

评论 #43317341 未加载

评论 #43327182 未加载

Timber-65392 个月前

AI has no meaningful input to real world productivity because it is a toy that is never going to become the real thing that every person who has naively bought the AI hype expects it to be. And the end result of all the hype looks almost too predictable similar to how the also once promising crypto & blockchain technology turned out.

viccis2 个月前

Regarding "AGI", is there any evidence of true synthetic a priori knowledge from an LLM?

评论 #43317777 未加载

bilsbie2 个月前

I have times when I use an LLM and it’s completely brain dead and can’t handle the simplest questions.Then other times it blows me away. Even figuring out things that can’t possibly have been in its training data.I think there are groups of people that have either had all of the first experience or all of the latter. And that’s why we see over optimistic and over pessimistic takes (like this one)I think the reality is current LLM’s are better than he realizes and even if we plateau I really don’t see how we don’t make more breakthroughs in the next few years.

gmt20272 个月前

The typical AI economic discussion always focuses on job loss, but that's only half the story. We won't just have corporations firing everyone while AI does all the work - who would buy their products then?The disruption goes both ways. When AI slashes production costs by 10-100x, what's the value proposition of traditional capital? If you don't need to organize large teams or manage complex operations, the advantage of "being a capitalist" diminishes rapidly.I'm betting on the rise of independents and small teams. The idea that your local doctor or carpenter needs VC funding or an IPO was always ridiculous. Large corps primarily exist to organize labor and reduce transaction costs.The interesting question: when both executives and frontline workers have access to the same AI tools, who wins? The manager with an MBA or the person with practical skills and domain expertise? My money's on the latter.

评论 #43317821 未加载

carlosdp2 个月前

评论 #43322000 未加载

n_ary2 个月前

Hmm, I didn’t read the article but from the gist of other comments, we seem to have bought into Sama’s “agents so good, you don’t need developers/engineers/support/secretaries/whatever anymore”. Issue is, it is almost same as claiming, pocket calculators so good, we don’t need accountants anymore, even computers so good, we don’t need accountants anymore. This AI seems to claim to be that motor car moment when horse cart got replaced. But a horse cart got replaced with a Taxi(and they also have unions protecting them!). With AI, all these “to be replaced” people are like accountants, more productive, same as with higher level languages compared to assembly, many new devs are productive. Despite cars replacing the horse carts of the long past, we still fail to have self driving cars and still someone needs to learn to drive that massive hunk of metal, same as whoever plans to deploy LLM to layoff devs must learn to drive those LLMs and know what it is doing.I believe it is high time we come out this madness and reveal the lies of the marketers and grifters of AI for what it is. If AI can replace anyone, it should begin with doctors, they work with rote knowledge and service based on explicit(though ambiguous) inputs, same as an LLM needs, but I still have doctors and wait for hours on end in the waiting room to get prescribed a cough hard candy only to later comeback again because it was actually covid and my doctor had a brain fart.

gymbeaux2 个月前

Yeah agree 100%. LLMs are overrated. I describe them as the “Jack of all, master of none” of AI. LLMs are that jackass guy we all know who has to chime in to every topic like he knows everything, but in reality he’s a fraud with low self-esteem.I’ve known a guy since college who now has a PhD in something niche, supposedly pulls a $200k/yr salary. One of our first conversations (in college, circa 2014) was how he had this clever and easy way to mint money- by selling Minecraft servers installed on Raspberry Pis. Some of you will recognize how asinine this idea was and is. For everyone else- back then, Minecraft only ran on x86 CPUs (and I doubt a Pi would make a good Minecraft server today, even if it were economical). He had no idea what he was talking about, he was just spewing shit like he was God’s gift. Actually, the problem wasn’t that he had no idea- it was that he knew a tiny bit- enough to sound smart to an idiot (remind you of anyone?).That’s an LLM. A jackass with access to Google.I’ve had great success with SLMs (small language models), and what’s more I don’t need a rack of NVIDIA L40 GPUs to train and use them.

评论 #43331671 未加载

liuliu2 个月前

I think all these articles begging the question: what's author's credential to claim these things.Be careful about consuming information from chatters, not doers. There is only knowledge from doing, not from pondering.

评论 #43317258 未加载

评论 #43317446 未加载

评论 #43317600 未加载

lackoftactics2 个月前

>GPT-5 will be even less of an improvement on GPT-4.5 than GPT-4.5 was on GPT-4. The pattern will continue for GPT-5.5 and GPT-6, the ~1000x and 10000x models they may train by 2029 (if they still have the money by then). Subtle quality-of-life improvements and meaningless benchmark jumps, but nothing paradigm-shifting.It's easy to spot people who secretly hate LLMs and feel threatened by them these days. GPT-5 will be a unified model, very different from 4o or 4.5. Throwing around numbers related to scaling laws shows a lack of proper research. Look at what DeepSeek accomplished with far fewer resources; their paper is impressive.I agree that we need more breakthroughs to achieve AGI. However, these models increase productivity, allowing people to focus more on research. The number of highly intelligent people currently working on AI is astounding, considering the number of papers and new developments. In conclusion, we will reach AGI. It's a race with high stakes, and history shows that these types of races don't stop until there is a winner.

评论 #43321626 未加载

评论 #43334535 未加载

评论 #43321618 未加载

评论 #43321665 未加载

评论 #43321748 未加载