TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

OpenAI, Google and Anthropic are struggling to build more advanced AI

625 pointsby lukebennett6 months ago

80 comments

thebigspacefuck6 months ago
<a href="https:&#x2F;&#x2F;archive.ph&#x2F;2024.11.13-100709&#x2F;https:&#x2F;&#x2F;www.bloomberg.com&#x2F;news&#x2F;articles&#x2F;2024-11-13&#x2F;openai-google-and-anthropic-are-struggling-to-build-more-advanced-ai" rel="nofollow">https:&#x2F;&#x2F;archive.ph&#x2F;2024.11.13-100709&#x2F;https:&#x2F;&#x2F;www.bloomberg.c...</a>
LASR6 months ago
Question for the group here: do we honestly feel like we&#x27;ve exhausted the options for delivering value on top of the current generation of LLMs?<p>I lead a team exploring cutting edge LLM applications and end-user features. It&#x27;s my intuition from experience that we have a LONG way to go.<p>GPT-4o &#x2F; Claude 3.5 are the go-to models for my team. Every combination of technical investment + LLMs yields a new list of potential applications.<p>For example, combining a human-moderated knowledge graph with an LLM with RAG allows you to build &quot;expert bots&quot; that understand your business context &#x2F; your codebase &#x2F; your specific processes and act almost human-like similar to a coworker in your team.<p>If you now give it some predictive &#x2F; simulation capability - eg: simulate the execution of a task or project like creating a github PR code change, and test against an expert bot above for code review, you can have LLMs create reasonable code changes, with automatic review &#x2F; iteration etc.<p>Similarly there are many more capabilities that you can ladder on and expose into LLMs to give you increasingly productive outputs from them.<p>Chasing after model improvements and &quot;GPT-5 will be PHD-level&quot; is moot imo. When did you hire a PHD coworker and they were productive on day-0 ? You need to onboard them with human expertise, and then give them execution space &#x2F; long-term memories etc to be productive.<p>Model vendors might struggle to build something more intelligent. But my point is that we already have so much intelligence and we don&#x27;t know what to do with that. There is a LOT you can do with high-schooler level intelligence at super-human scale.<p>Take a naive example. 200k context windows are now available. Most people, through ChatGPT, type out maybe 1500 tokens. That&#x27;s a huge amount of untapped capacity. No human is going to type out 200k of context. Hence why we need RAG, and additional forms of input (eg: simulation outcomes) to fully leverage that.
评论 #42140726 未加载
评论 #42140135 未加载
评论 #42140679 未加载
评论 #42140383 未加载
评论 #42140886 未加载
评论 #42140358 未加载
评论 #42143700 未加载
评论 #42141399 未加载
评论 #42140669 未加载
评论 #42140918 未加载
评论 #42140747 未加载
评论 #42140086 未加载
评论 #42142919 未加载
评论 #42142765 未加载
评论 #42140970 未加载
评论 #42140349 未加载
评论 #42140790 未加载
评论 #42145093 未加载
评论 #42144404 未加载
评论 #42140126 未加载
评论 #42140936 未加载
评论 #42140661 未加载
评论 #42142944 未加载
评论 #42140827 未加载
评论 #42143001 未加载
评论 #42144433 未加载
评论 #42142581 未加载
评论 #42143008 未加载
评论 #42141275 未加载
评论 #42140604 未加载
评论 #42145589 未加载
评论 #42140907 未加载
评论 #42146002 未加载
评论 #42143033 未加载
评论 #42141020 未加载
评论 #42140347 未加载
评论 #42144682 未加载
评论 #42144031 未加载
评论 #42143483 未加载
评论 #42143212 未加载
评论 #42143286 未加载
评论 #42141651 未加载
评论 #42141796 未加载
iandanforth6 months ago
A few important things to remember here:<p>The best engineering minds have been focused on scaling transformer pre and post training for the last three years because they had good reason to believe it would work, and it has up until now.<p>Progress has been measured against benchmarks which are &#x2F; were largely solvable with scale.<p>There is another emerging paradigm which is still small(er) scale but showing remarkable results. That&#x27;s full multi-modal training with embodied agents (aka robots). 1x, Figure, Physical Intelligence, Tesla are all making rapid progress on functionality which is definitely beyond frontier LLMs because it is distinctly <i>different</i>.<p>OpenAI&#x2F;Google&#x2F;Anthropic are not ignorant of this trend and are also reviving or investing in robots or robot-like research.<p>So while Orion and Claude 3.5 opus may not be another shocking giant leap forward, that does <i>not</i> mean that there arn&#x27;t giant shocking leaps forward coming from slightly different directions.
评论 #42139779 未加载
评论 #42142983 未加载
评论 #42140421 未加载
评论 #42141563 未加载
评论 #42140194 未加载
评论 #42140069 未加载
评论 #42142249 未加载
评论 #42143148 未加载
评论 #42139984 未加载
osigurdson6 months ago
This &quot;running out of data&quot; thing suggests that there is something fundamentally wrong with how things are working. A new driver does not need to experience 8000 different rabbit-on-road situations from all angles to know to slow down when we see one on the road. Similarly we don&#x27;t need 10,000 addition examples to learn how to add. It is as though there is no generalization in the models - just fundamentally search.
评论 #42149778 未加载
评论 #42144498 未加载
Animats6 months ago
<i>&quot;While the model was initially expected to significantly surpass previous versions of the technology behind ChatGPT, it fell short in key areas, particularly in answering coding questions outside its training data.&quot;</i><p>Right. If you generate some code with ChatGPT, and then try to find similar code on the web, you usually will. Search for unusual phrases in comments and for variable names. Often, something from Stack Overflow will match.<p>LLMs do search and copy&#x2F;paste with idiom translation and some transliteration. That&#x27;s good enough for a lot of common problems. Especially in the HTML&#x2F;Javascript space, where people solve the same problems over and over. Or problems covered in textbooks and classes.<p>But it does not look like artificial general intelligence emerges from LLMs alone.<p>There&#x27;s also the elephant in the room - the hallucination&#x2F;lack of confidence metric problem. The curse of LLMs is that they return answers which are confident but wrong. &quot;I don&#x27;t know&quot; is rarely seen. Until that&#x27;s fixed, you can&#x27;t trust LLMs to actually <i>do</i> much on their own. LLMs with a confidence metric would be much more useful than what we have now.
评论 #42139986 未加载
评论 #42141067 未加载
评论 #42143954 未加载
评论 #42140895 未加载
ziofill6 months ago
I think it is a good thing for AI that we hit the data ceiling, because the pressure moves toward coming up with better model architectures. And with respect to a decade ago there&#x27;s a much larger number of capable and smart AI researchers who are looking for one.
aresant6 months ago
Taking a hollistic view informed by a disruptive OpenAI &#x2F; AI &#x2F; LLM twitter habit I would say this is AI&#x27;s &quot;What gets measured gets managed&quot; moment and the narrative will change<p>This is supported by both general observations and recently this tweet from an OpenAI engineer that Sam responded to and engaged -&gt;<p>&quot;scaling has hit a wall and that wall is 100% eval saturation&quot;<p>Which I interpert to mean his view is that models are no longer yielding significant performance improvements because the models have maxed out existing evaluation metrics.<p>Are those evaluations (or even LLMs) the RIGHT measures to achieve AGI? Probably not.<p>But have they been useful tools to demonstrate that the confluence of compute, engineering, and tactical models are leading towards signifigant breathroughts in artificial (computer) intelligence?<p>I would say yes.<p>Which in turn are driving the funding, power innovation, public policy etc needed to take that next step?<p>I hope so.<p>(1) <a href="https:&#x2F;&#x2F;x.com&#x2F;willdepue&#x2F;status&#x2F;1856766850027458648" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;willdepue&#x2F;status&#x2F;1856766850027458648</a>
评论 #42139702 未加载
评论 #42142811 未加载
headcanon6 months ago
I don&#x27;t see a problem with this, we were inevitably going to reach some kind of plateau with existing pre-LLM-era data.<p>Meanwhile, the existing tech is such a step change that industry is going to need time to figure out how to effectively use these models. In a lot of ways it feels like the &quot;digitization&quot; era all over again - workflows and organizations that were built around the idea humans handled all the cognitive load (basically all companies older than a year or two) will need time to adjust to a hybrid AI + human model.
评论 #42141342 未加载
jmward016 months ago
Every negative headline I see about AI hitting a wall or being over-hyped makes me think of the early 2000&#x27;s with that new thing the &#x27;internet&#x27; (yes, I know the internet is a lot older than that). There is little doubt in my mind that ten years from now nearly every aspect of life will be deeply connected to AI just like the internet took over everything in the late 90&#x27;s and early 2000&#x27;s and is now deeply connected to everything now. I&#x27;d even hazard to say that AI could be more impactful.
评论 #42143108 未加载
评论 #42140699 未加载
评论 #42145614 未加载
评论 #42146038 未加载
评论 #42141362 未加载
评论 #42140872 未加载
评论 #42143930 未加载
评论 #42141703 未加载
WorkerBee284746 months ago
&gt; OpenAI&#x27;s latest model ... failed to meet the company&#x27;s performance expectations ... particularly in answering coding questions outside its training data.<p>So the models&#x27; accuracies won&#x27;t grow exponentially, but can still grow linearly with the size of the training data.<p>Sounds like DataAnnotation will be sending out a lot more LinkedIn messages.
评论 #42139271 未加载
kklisura6 months ago
Not sure if related or not, Sam Altman, ~12hrs ago: there is no wall [1]<p>[1] <a href="https:&#x2F;&#x2F;x.com&#x2F;sama&#x2F;status&#x2F;1856941766915641580" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;sama&#x2F;status&#x2F;1856941766915641580</a>
评论 #42139775 未加载
评论 #42141893 未加载
评论 #42141621 未加载
评论 #42144724 未加载
评论 #42142881 未加载
评论 #42145286 未加载
pluc6 months ago
They&#x27;ve simply run out of data to use to fabricate legitimate-looking guesses. They can&#x27;t create anything that doesn&#x27;t already exist.
评论 #42141114 未加载
评论 #42141590 未加载
评论 #42139490 未加载
评论 #42149715 未加载
评论 #42141125 未加载
评论 #42140441 未加载
评论 #42141888 未加载
danjl6 months ago
Where will the training data for coding come from now that Stack Overflow has effectively been replaced? Will the LLMs share fixes for future problems? As the world moves forward, and the amount of non-LLM generated data decreases, will LLMs actually revert their advancements and become effectively like addled brains, longing for the &quot;good old times&quot;?
irrational6 months ago
&gt; The AGI bubble is bursting a little bit<p>I&#x27;m surprised that any of these companies consider what they are working on to be Artificial General Intelligences. I&#x27;m probably wrong, but my impression was AGI meant the AI is self aware like a human. An LLM hardly seems like something that will lead to self-awareness.
评论 #42139338 未加载
评论 #42139186 未加载
评论 #42139138 未加载
评论 #42139782 未加载
评论 #42139569 未加载
评论 #42139286 未加载
评论 #42139257 未加载
评论 #42139243 未加载
评论 #42139969 未加载
评论 #42139950 未加载
评论 #42157364 未加载
评论 #42139534 未加载
评论 #42139633 未加载
评论 #42142661 未加载
评论 #42140128 未加载
评论 #42139855 未加载
评论 #42139294 未加载
评论 #42140234 未加载
grey-area6 months ago
The biggest weakness of generative AI to me is knowledge. It gives the <i>impression</i> of knowledge about the world without actually having a model of the world or any sense of what it does or does not know.<p>For example recently I asked it to generate some phrases for a list of words, along with synonym and antonym lists.<p>The phrases were generally correct and appropriate (some mistakes but that’s fine). The synonyms&#x2F;antonyms were misaligned to the list (so strictly speaking all wrong) and were often incorrect anyway. I imagine it would be the same if you asked for definitions of a list of words.<p>If you ask it to correct it just generates something else which is often also wrong. It’s certainly superficially convincing in many domains but once you try to get it to do real work it’s wrong in subtle ways.
benopal646 months ago
I am not sure how these large companies think they will reach &quot;greater-than-human&quot; intelligence any time soon if they do not create systems that financially incentivize people to sell their knowledge labor (unstable contracting gigs are not attractive).<p>Where do these large &quot;AI&quot; companies think the mass amounts of data used to train these models come from? People! The most powerful and compact complex systems in existence, IMO.
评论 #42139356 未加载
评论 #42166964 未加载
sssilver6 months ago
One thing that makes the established AIs less ideal for my (programming) use-case is that the technologies I use quickly evolve past whatever the published models &quot;learn&quot;.<p>On the other hand, a lot of these frameworks and languages have relatively decent and detailed documentation.<p>Perhaps this is a naive question, but why can&#x27;t I as a user just purchase &quot;AI software&quot; that comes with a large pre-trained model to which I can say, on my own machine, &quot;go read this documentation and help me write this app in this next version of Leptos&quot;, and it would augment its existing model with this new &quot;knowledge&quot;.
评论 #42144812 未加载
fallat6 months ago
What a stupid piece. We are making leaps every 6 months still. Tell me this when there are no developments for 3 years.
评论 #42141347 未加载
svara6 months ago
The recent big success in deep learning have all been to a large part successes in leveraging relatively cheaply available training data.<p>AlphaGo - self-play<p>AlphaFold - PDB, the protein database<p>ChatGPT - human knowledge encoded as text<p>These models are all machines for clever interpolation in gigantic training datasets.<p>They appear to be intelligent, because the training data they&#x27;ve seen is so vastly larger than what we&#x27;ve seen individually, and we have poor intuition for this.<p>I&#x27;m not throwing shade, I&#x27;m a daily user of ChatGPT and find tremendous and diverse value in it.<p>I&#x27;m just saying, this particular path in AI is going to make step-wise improvements whenever new large sources of training data become available.<p>I suspect the path to general intelligence is not that, but we&#x27;ll see.
评论 #42140309 未加载
datahack6 months ago
The next wave won’t be monolithic but network-driven. Orchestration has the potential to integrate diverse AI systems and complementary technologies, such as advanced fact-checking and rule-based output frameworks.<p>This methodological growth could make LLMs more reliable, consistent, and aligned with specific use cases.<p>The skepticism surrounding this vision mirrors early doubts about the early internet fairly concisely.<p>Initially, the internet was seen as fragmented collection of isolated systems without a clear structure or purpose. It really was. You would gopher somewhere and get a file, and eventually we had apps like like pine for email, but as cool as it was it has limited utility.<p>People doubted it could <i>ever</i> become the seamless, interconnected web we know today.<p>Yet, through protocols, shared standards, and robust frameworks, the internet evolved into a powerful network capable of handling diverse applications, data flows, and user needs.<p>In the same way, LLM orchestration will mature by standardizing interfaces, improving interoperability, and fostering cooperation among varied AI models and support systems.<p>Just as the internet needed HTTP, TCP&#x2F;IP, and other protocols to unify disparate networks, orchestrated AI systems will require foundational frameworks and “rules of the road” that bring cohesion to diverse technologies.<p>We are at the veeeeery infancy of this era and have a LONG way to go here. Some of the progress looks clear and a linear progression, but a lot, like the Internet, will just take a while to mature and we shouldn’t forget what we learned the last time we faced a sea change technological revolution.
评论 #42145742 未加载
cryptica6 months ago
It&#x27;s interesting the way things turned out so far with LLMs, especially from the perspective of a software engineer. We are trained to keep a certain skepticism when we see software which appears to be working because, ultimately, the only question we care about is &quot;Does it meet user requirements?&quot; and this is usually framed in terms of users achieving certain goals.<p>So it&#x27;s interesting that when AI came along, we threw caution to the wind and started treating it like a silver bullet... Without asking the question of whether it was applicable to this goal or that goal...<p>I don&#x27;t think anyone could have anticipated that we could have an AI which could produce perfect sentences, faster than a human, better than a human but which could not reason. It appears to reason very well, better than most people, yet it doesn&#x27;t actually reason. You only notice this once you ask it to accomplish a task. After a while, you can feel how it lacks willpower. It puts into perspective the importance of willpower when it comes to getting things done.<p>In any case, LLMs bring us closer to understanding some big philosophical questions surrounding intelligence and consciousness.
wslh6 months ago
It sounds a bit sci-fi, but since these models are built on data generated by our civilization, I wonder if there&#x27;s an epistemological bottleneck requiring smarter or more diverse individuals to produce richer data. This, in turn, could spark further breakthroughs in model development. Although these interactions with LLMs help address specific problems, truly complex issues remain beyond their current scope.<p>With my user hat on, I&#x27;m quite pleased with the current state of LLMs. Initially, I approached them skeptically, using a hackish mindset and posing all kinds of Turing test-like questions. Over time, though, I shifted my focus to how they can enhance my team&#x27;s productivity and support my own tasks in meaningful ways.<p>Finally, I see LLMs as a valuable way to explore parts of the world, accommodating the reality that we simply don’t have enough time to read every book or delve into every topic that interests us.
评论 #42149854 未加载
LarsDu886 months ago
Curves that look exponential in virtually all cases turn out to be logarithmic.<p>Certain OpenAI insiders must have known this for a while, hence Ilya Sutskever&#x27;s new company in Israel
nutanc6 months ago
Let&#x27;s keep aside the hype. Let&#x27;s define more advanced AI. With current architectures, this basically means better copying machines(don&#x27;t mean this in a bad way and don&#x27;t want a debate on this. This is just my opinion based on my usage). Basically everything in the Internet has been crammed into the weights and the companies are finding it hard to do two things:<p>1. Find more data.<p>2. Make the weights capture the data and reproduce.<p>In that sense we have reached a limit. So in my opinion we can do a couple of things.<p>1. App developers can understand the limits and build within the limits.<p>2. Researchers can take insights from these large models and build better AI systems with new architectures. It&#x27;s ok to say transformers have reached a limit.
glial6 months ago
I think self-consistency is a critical feature of LLMs or any AI that&#x27;s currently missing. It&#x27;s one of the core attributes of truth [1], in addition to the order and relationship of statements corresponding to the order and relationship of things in the world. I wonder if some kind of hierarchical language diffusion model would be a way to implement this -- where text is not produced sequentially, but instead hierarchically, with self-consistency checks at each level.<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Coherence_theory_of_truth" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Coherence_theory_of_truth</a>
fsndz6 months ago
Sam Altman might be wrong then?<p>Learning from data is not enough; there is a need for the kind of system-two thinking we humans develop as we grow. It is difficult to see how deep learning and backpropagation alone will help us model that. For tasks where providing enough data is sufficient to cover 95% of cases, deep learning will continue to be useful in the form of &#x27;data-driven knowledge automation.&#x27; For other cases, the road will be much more challenging. <a href="https:&#x2F;&#x2F;www.lycee.ai&#x2F;blog&#x2F;why-sam-altman-is-wrong" rel="nofollow">https:&#x2F;&#x2F;www.lycee.ai&#x2F;blog&#x2F;why-sam-altman-is-wrong</a>
评论 #42143344 未加载
guluarte6 months ago
Well, there have been no significant improvements to the GPT architecture over the past few years. I&#x27;m not sure why companies believe that simply adding more data will resolve the issues
评论 #42141206 未加载
评论 #42140384 未加载
评论 #42140121 未加载
czhu126 months ago
If it becomes obvious that LLM&#x27;s have a more narrow set of use cases, rather than the all encompassing story we hear today, then I would bet that the LLM platforms (OpenAI, Anthropic, Google, etc) will start developing products to compete directly with applications that supposed to be building on top of them like Cursor, in an attempt to increase their revenue.<p>I wonder what this would mean for companies raising today on the premise of building on top of these platforms. Maybe the best ones get their ideas copied, reimplemented, and sold for cheaper?<p>We already kind of see this today with OpenAI&#x27;s canvas and Claude artifacts. Perhaps they&#x27;ll even start moving into Palantir&#x27;s space and start having direct customer implementation teams.<p>It is becoming increasing obvious that LLM&#x27;s are quickly becoming commoditized. Everyone is starting to approach the same limits in intelligence, and are finding it hard to carve out margin from competitors.<p>Most recently exhibited by the backlash at claude raising prices because their product is better. In any normal market, this would be totally expected, but people seemed shocked that anyone would charge more than the raw cost it would take to run the LLM itself.<p><a href="https:&#x2F;&#x2F;x.com&#x2F;ArtificialAnlys&#x2F;status&#x2F;1853598554570555614" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;ArtificialAnlys&#x2F;status&#x2F;1853598554570555614</a>
评论 #42143796 未加载
summerlight6 months ago
I guess this is somewhat expected? The current frontier models probably already have exhausted most of the entropy in the training data accumulated over decades and the new training data is very sparse. And the current mainstream architectures are not capable of sophisticated searching and planning, essential aspects for generating new entropy out of thin air. o1 was an interesting attempt to tackle this problem, but we probably still have a long way to go.
xyst6 months ago
Many late investors in the genAI space about to be bag holders
the_king6 months ago
Anthropic&#x27;s latest 3.5 sonnet is a cut above GPT-4 and 4.0. And if someone had given it to me and said, here&#x27;s GPT-4.5, I would have been very happy with it.
评论 #42143397 未加载
thousand_nights6 months ago
not long ago these people would have you believe that a next word predictor trained on reddit posts would somehow lead to artificial general superintelligence
评论 #42139199 未加载
评论 #42141632 未加载
评论 #42139241 未加载
评论 #42139443 未加载
shmatt6 months ago
Time to start selling my &quot;probabilistic syllable generators are not intelligence&quot; t shirts
评论 #42139336 未加载
devit6 months ago
It seems obvious to me that Common Crawl plus Github public repositories have more than an enough data to train an AI that is as good as any programmer (at tasks not requiring knowledge of non-public codebases or non-public domain knowledge).<p>So the problem is more in the algorithm.
评论 #42141675 未加载
EternalFury6 months ago
If GPT-5 had passed the A&#x2F;B testing OpenAI likes to do, it would have been released already. Instead, it seems they are clearly concerned the audience would not find it superior enough to GPT-4. So, the bluff must go on until the right cards appear.
Timber-65396 months ago
Direct quote from the article: &quot;The companies are facing several challenges. It’s become increasingly difficult to find new, untapped sources of high-quality, human-made training data that can be used to build more advanced AI systems.&quot;<p>The irony here is astounding.
评论 #42142698 未加载
评论 #42145740 未加载
nerdypirate6 months ago
&quot;We will have better and better models,&quot; wrote OpenAI CEO Sam Altman in a recent Reddit AMA. &quot;But I think the thing that will feel like the next giant breakthrough will be agents.&quot;<p>Is this certain? Are Agents the right direction to AGI?
评论 #42139896 未加载
评论 #42139151 未加载
评论 #42139637 未加载
评论 #42139134 未加载
评论 #42144173 未加载
评论 #42139574 未加载
评论 #42139155 未加载
Dr_Birdbrain6 months ago
I don’t know how to square this with the recent statement by Dario Amodei (Anthropic CEO) on the Lex Fridman podcast saying that in his opinion the scaling hypothesis still has plenty of room to run.
评论 #42143037 未加载
Havoc6 months ago
The new Gemini just hit some good benchmarks.<p>This smells like it’s mostly based on OAI having a bit of bad luck with next model rather than a fundamental slowdown &#x2F; barrier.<p>They literally just made a decent sized leap with o1
评论 #42143222 未加载
wg06 months ago
AI winter is here. Almost.
评论 #42126273 未加载
评论 #42150039 未加载
cubefox6 months ago
It&#x27;s very strange this got so few upvotes. The scoop by The Information a few days ago, which came to similar conclusions, was also ignored on HN. This is arguably rather big news.
评论 #42139892 未加载
评论 #42141142 未加载
smusamashah6 months ago
It has to be a good thing to stop here. We can focus on improving what we have right now. The whole stack of models is an amazing innovation no matter what. It shouldn&#x27;t hurt if we pause here for a while and try to build on this or improve this.<p>It will be like StableDiffusion 1.5. This model can now run on low end devices, lots of open research use this model to build something else and inspire by this.<p>These LLMs can be used as a foundation to keep improving and building new things.
KETpXDDzR6 months ago
LLMs are glorified Markow chains in the end. They can&#x27;t reason or think, even when they are good in pretending they can. What we need is a totally different approach IMO.
Veuxdo6 months ago
&gt; They are also experimenting with synthetic data, but this approach has its limitations.<p>I was really looking forward to using &quot;synthetic data&quot; euphemistically during debates.
zusammen6 months ago
I wonder how much this has to do with a fluency plateau.<p>Up to a certain point, a conditional fluency stores knowledge, in the sense that semantically correct sentences are more likely to be fluent… but we may have tapped out in that regard. LLMs have solved language very well, but to get beyond that has seemed, thus far, to require RLHF, with all the attendant negatives.
评论 #42140301 未加载
tippytippytango6 months ago
There’s only so much you can do when you train on the data instead of the processes that created that data.
m3kw96 months ago
Hold your horses, OpenAI just came out with o1preview 2 months ago, showing what test time computer can do
eichi6 months ago
Scientific benchmarks score are not necessary related to the rate of completion of tasks such as user persuasion. Software engineering is more important when the current state-of-the-art small language model is sufficient for soltion of our application.
nikkwong6 months ago
Didn’t Sam Altman just go on some podcast last week and tell the world that he thought “We know exactly what to do to be able to reach AGI now”. What’s going on, is he just posturing?
评论 #42143527 未加载
评论 #42149771 未加载
nomendos6 months ago
&quot;Eureka&quot;!?<p>At the very early phase of the boom I was among a very few who knew and predicted this (usually most free and deep thinking&#x2F;knowledgeable). Then my prediction got reinforced by the results. One of the best examples was with one of my experiments that all today&#x27;s AI&#x27;s failed to solve tree serialization and de-serialization in each of the DFS(pre-order&#x2F;in-order&#x2F;post-order) or BFS(level-order) which is 8 algorithms (2x4) and the result was only 3 correct! Reason is &quot;limited training inputs&quot; since internet and open source does not have other solutions :-) .<p>So, I spent &quot;some&quot; time and implemented all 8, which took me few days. By the way this proves&#x2F;demonstrates that ~15-30min pointless leetcode-like interviews are requiring to regurgitate&#x2F;memorize&#x2F;not-think. So, as a logical hard consequence there will.has-to be a &quot;crash&#x2F;cleanup&quot; in the area of leetcode-like interviews as they will just be suddenly proclaimed as &quot;pointless&#x2F;stupid&quot;). However, I decided not to publish the rest of the 5 solutions :-)<p>This (and other experiments) confirms hard limits of the LLM approach (even when used with chain-of-thought). Increasing the compute on the problem will produce increasingly smaller and smaller results (inverse exponential&#x2F;logarithmic&#x2F;diminishing-returns) = new AGI approach&#x2F;design is needed and to my knowledge majority of the inve$tment (~99%) is in LLM, so &quot;buckle up&quot; at-some-point&#x2F;soon?<p>Impacts and realities; LLM shall &quot;run it&#x27;s course&quot; (produce some products&#x2F;results&#x2F;$$$, get reviewed&#x2F;$corrected) and whoever survives after that pruning shall earn money on those products while investing in the new research to find new AGI design&#x2F;approach (which could take quite a long time,... or not). NVDA is at the center of thi$ and time-wise this peak&#x2F;turn&#x2F;crash&#x2F;correction is hard to predict (although I see it on the horizon and min&#x2F;max time can be estimated). Be aware and alert. I&#x27;ll stop here and hold my other number of thoughts&#x2F;opinions&#x2F;ideas for much deeper discussion. (BTW I am still &quot;full in on NVDA&quot; until,....)
评论 #42143050 未加载
superjose6 months ago
I&#x27;m more on the camp that these techs don&#x27;t need to be perfect, but they need to be practical enough.<p>And I think the latter is good enough for us to do exciting things.
评论 #42141768 未加载
GiorgioG6 months ago
It’s about time the hype starts to die down. LLMs are brilliant for small bits of grunt work in software. It is not however doing any actual reasoning.
Bjorkbat6 months ago
It&#x27;s kind of, I don&#x27;t know, &quot;weird&quot;, observing how there&#x27;s all these news outlets reporting on how essentially every up-and-coming model has not performed as expected, while all the employees at these labs haven&#x27;t changed their tune in the slightest.<p>And there&#x27;s a number of reasons why, mostly likely being that they&#x27;ve found other ways to get improvements out of AI models, so diminishing returns on training aren&#x27;t that much of a problem. Or, maybe the leakers are lying, but I highly doubt that considering the past record of news outlets reporting on accurate leaked information.<p>Still though, it&#x27;s interesting how basically ever frontier lab created a model that didn&#x27;t live up to expectations, and every employee at these labs on Twitter has continued to vague-post and hype as if nothing ever happened.<p>It&#x27;s honestly hard to tell whether or not they really know something we don&#x27;t, or if they have an irrational exuberance for AGI bordering on cult-like, and they will never be able to mentally process, let alone admit, that something might be wrong.
yalogin6 months ago
I do wonder how quickly llms will become a commodity AI instrument just like any other AI out there. If so what happens to openAI
non-6 months ago
Honestly could use a breather from the recent rate of progress. We are just barely figuring out how to interact with the models we have now. I&#x27;d bet there are at least 100 billion-dollar startups that will be built even if these labs stopped releasing new models tomorrow.
atomsatomsatoms6 months ago
At least they can generate haikus now
评论 #42139466 未加载
aurareturn6 months ago
Is there any timeline on AI winters and if each winter gets shorter and shorter as time increases?
评论 #42127227 未加载
rubiquity6 months ago
&gt; Amodei has said companies will spend $100 million to train a bleeding-edge model this year<p>Is it just me or does $100 million sound like it&#x27;s on the very, very low end of how much training a new model costs? Maybe you can arrive within $200 million of that mark with amortization of hardware? It just doesn&#x27;t make sense to me that a new model would &quot;only&quot; be $100 million when AmaGooBookSoft are spending tens of billions on hardware and the AI startups are raising billions every year or two.
gchamonlive6 months ago
We should put a model in an actual body and let it in the world to build from experiences. Inference is costly though, so the robot would interact during a period and update it&#x27;s model during another period, flushing the context window (short term memory) into its training set (long term memory).
评论 #42142858 未加载
评论 #42142901 未加载
Oras6 months ago
I think Meta will have upper hand soon with the release of their glasses. If they managed to make it a daily use glass, and paid users to record and share their life, then they will have data no one else has now. Mix of vision, audio, and physics.
评论 #42140601 未加载
评论 #42143806 未加载
评论 #42139674 未加载
lobochrome6 months ago
Isn’t this just the expected delay from the respin of Blackwell?
user901313136 months ago
AI market top very soon
polskibus6 months ago
In other news, Altman said AGI is coming next year <a href="https:&#x2F;&#x2F;www.tomsguide.com&#x2F;ai&#x2F;chatgpt&#x2F;sam-altman-claims-agi-is-coming-in-2025-and-machines-will-be-able-to-think-like-humans-when-it-happens" rel="nofollow">https:&#x2F;&#x2F;www.tomsguide.com&#x2F;ai&#x2F;chatgpt&#x2F;sam-altman-claims-agi-i...</a>
评论 #42143048 未加载
评论 #42140127 未加载
评论 #42149802 未加载
k__6 months ago
But AGI is always right around the corner?<p>I don&#x27;t get it...
kaycey20226 months ago
AI safety folks sure do look stupid now. :)
wildermuthn6 months ago
Simply put, AGI requires more data: qualia.
_Algernon_6 months ago
The next AI winter will be brutal
quantum_state6 months ago
Hope this would be a constant reminder that brute force can only get one that far, though it may still be useful when it is. With lots of intuition gained, it’s time to ponder things a bit more deeply.
评论 #42141947 未加载
russellbeattie6 months ago
Go back a few decades and you&#x27;d see articles like this about CPU manufacturers struggling to improve processor speeds and questioning if Moore&#x27;s Law was dead. Obviously those concerns were way overblown.<p>That doesn&#x27;t mean this article is irrelevant. It&#x27;s good to know if LLM improvements are going to slow down a bit because the low hanging fruit has seemingly been picked.<p>But in terms of the overall effect of AI and questioning the validity of the technology as a whole, it&#x27;s just your basic FUD article that you&#x27;d expect from mainstream news.
评论 #42142789 未加载
评论 #42141152 未加载
wanderingmind6 months ago
And yet the Anthropic CEO is still claiming PhD level intelligence in next couple of years to Lex Friedman. It&#x27;s starting to feel like the whole crypto pump and dump again
kaibee6 months ago
Not sure where the OP to the comment I meant to reply to is, but I&#x27;ll just add this here.<p>&gt; I suspect the path to general intelligence is not that, but we&#x27;ll see.<p>I think there&#x27;s three things that a &#x27;true&#x27; general intelligence has which is missing from basic-type-LLMs as we have now.<p>1. knowing what you know. &lt;basic-LLMs are here&gt;<p>2. knowing what you don&#x27;t know but can figure out via tools&#x2F;exploration. &lt;this is tool use&#x2F;function calling&gt;<p>3. knowing what can&#x27;t be known. &lt;this is knowing that halting problem exists and being able to recognize it in novel situations&gt;<p>(1) From an LLM&#x27;s perspective, once trained on corpus of text, it knows &#x27;everything&#x27;. It knows about the concept of not knowing something (from having see text about it), (in so far as an LLM knows anything), but it doesn&#x27;t actually have a growable map of knowledge that it knows has uncharted edges.<p>This is where (2) comes in, and this is what tool use&#x2F;function calling tries to solve atm, but the way function calling works atm, doesn&#x27;t give the LLM knowledge the right way. I know that I don&#x27;t know what 3,943,034 &#x2F; 234,893 is. But I know I have a &#x27;function call&#x27; of knowing the algorithm for doing long divison on paper. And I think there&#x27;s another subtle point here: my knowledge in (1) includes the training data generated from running the intermediate steps of the long-division algorithm. This is the knowledge that later generalizes to being able to use a calculator (and this is also why we don&#x27;t just give kids calculators in elementary school). But this is also why a kid that knows how to do long division on paper, doesn&#x27;t seperately need to learn when&#x2F;how to use a calculator, besides the very basics. Using a calculator to do that math feels like 1 step, but actually it does still have all of initial mechanical steps of setting up the problem on paper. You have to type in each digit individually, etc.<p>(3) I&#x27;m less sure of this point now that I&#x27;ve written out point (1) and (2), but that&#x27;s kinda exactly the thing I&#x27;m trying to get at. Its being able to recognize when you need more practice of (1) or more &#x27;energy&#x2F;capital&#x27; for doing (2).<p>Consider a burger resturant. If you properly populated the context of a ChatGPT-scale model the data for a burger resturant from 1950, and gave it the kinda &#x27;function calling&#x27; we&#x27;re plugging into LLMs now, it could manage it. It could keep track of inventory, it could keep tabs on the employee-subprocesses, knowing when to hire, fire, get new suppliers, all via function calling. But it would never try to become McDonalds, because it would have no model of the the internals of those function-calls, and it would have no ability to investigate or modify the behaviour of those function calls.
12_throw_away6 months ago
Well shoot. It&#x27;s not like it was patently obvious that this would happen <i>before</i> the industry started guzzling electricity and setting money on fire, right? [1]<p>[1] <a href="https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;3442188.3445922" rel="nofollow">https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;3442188.3445922</a>
easeout6 months ago
I&#x27;m happy to use LLM products for what they can do right now, while they&#x27;re still cheap. Even though they&#x27;re maintained by high investment that may never pay off, enshittification has not yet set in.
mrandish6 months ago
Based on recent rumblings about AI scaling hitting a wall, of which this article is perhaps the most visible - and in a high-reach financial publication, I&#x27;m considering increasing my estimated probability we might see a major market correction next year (and possibly even a bubble collapse). (example: &quot;CONFIRMED: LLMs have indeed reached a point of diminishing returns&quot; <a href="https:&#x2F;&#x2F;garymarcus.substack.com&#x2F;p&#x2F;confirmed-llms-have-indeed-reached" rel="nofollow">https:&#x2F;&#x2F;garymarcus.substack.com&#x2F;p&#x2F;confirmed-llms-have-indeed...</a>).<p>To be clear, I don&#x27;t think a near-term bubble collapse is likely but I&#x27;m going from 3% to maybe ~10%. Also, this doesn&#x27;t mean I doubt there&#x27;s real long-term value to be delivered or money to be made in AI solutions. I&#x27;m thinking specifically about those who&#x27;ve been speculatively funding the massive build out of data centers, energy and GPU supply expecting near-term demand to continue scaling at the recent unprecedented rates. My understanding is much of this is being funded in advance of actual end-user demand at these elevated levels and it is being funded either by VC money or debt by parties who could struggle to come up with the cash to pay for what they&#x27;ve ordered if either user demand or their equity value doesn&#x27;t continue scaling as expected.<p>Admittedly this scenario assumes that these investment commitments are sufficiently speculative and over-committed to create bubble dynamics and tipping points. The hypothesis goes like this: the money sources who&#x27;ve over-committed to lock up scarce future supply in the expectation it will earn outsize returns have already started seeing these warning signs of efficiency and&#x2F;or progress rates slowing which are now hitting mainstream media. Thus it&#x27;s possible there is already a quiet collapse beginning wherein the largest AI data center GPU purchasers might start trying to postpone future delivery schedules and may soon start trying to downsize or even cancel existing commitments or try to offload some of their future capacity via sub-leasing it out before it even arrives, etc. Being a dynamic market, this could trigger a rapidly snowballing avalanche of falling prices for next-year AI compute (which is already bought and sold as a commodity like pork belly futures).<p>Notably, there are now rumors claiming some of the largest players don&#x27;t currently have the cash to pay for what they&#x27;ve already committed to for future delivery. They were making calculated bets they&#x27;d be able to raise or borrow that capital before payments were due. Except if expectation begins to turn downward, fresh investors will be scarce and banks will reprice a GPU&#x27;s value as loan collateral down to pennies on the dollar (shades of the 2009 financial crisis where the collateral value of residential real estate assets was marked down). As in most bubbles, cheap credit is the fuel driving growth and that credit can get more expensive very quickly - which can in turn trigger exponential contagion effects causing the bubble to pop. A very different kind of &quot;Foom&quot; than many AI financial speculators were betting on! :-)<p>So... in theory, under this scenario sometime next year NVidia&#x2F;TSMC and other top-of-supply-chain companies could find themselves with excess inventories of advanced node wafers because a significant portion of their orders were from parties who no longer have access to the cheap capital to pay for them. And trying to sue so many customers for breach can take a long time and, in a large enough sector collapse, be only marginally successful in recouping much actual cash.<p>I&#x27;d be interested in hearing counter-arguments (or support) for the impossibility (or likelihood) of such a scenario.
评论 #42143601 未加载
评论 #42150091 未加载
jppope6 months ago
Just an observation. If the models are hitting the top of the S-curve, that might be why Sam Altman raised all the money for OpenAI... it might not be available if Venture Capitalists realize that the gains are close to being done
bad_haircut726 months ago
Im no Alan Turing but I have my own definition for AGI - when I come home one day and there&#x27;s a hole under my sink with a note &quot;Mum and Dad, I love you but I cant stand this life any more, Im running away to be a smoke machine in Hollywood - the dishwasher&quot;
评论 #42139556 未加载
评论 #42139378 未加载
dangw6 months ago
where the fuck is simonw in this thread<p>xd
yobid206 months ago
This was predicted. Ai isnt going to get any better.
Davidzheng6 months ago
Just because you guys want something to be true and can&#x27;t accept the alternative and upvote it when it agrees with your view does not mean it is a correct view.
评论 #42140210 未加载
aaroninsf6 months ago
It&#x27;s easy to be snarky at ill-informed and hyperbolic takes, but it&#x27;s also pretty clear that large multi-modal models trained with the data we already have, are going to eventually give us AGI.<p>IMO this will require not just much more expansive multi-modal training, but also novel architecture, specifically, recurrent approaches; plus a well-known set of capabilities most systems don&#x27;t currently have, e.g. the integration of short-term memory (context window if you like) into long-term &quot;memory&quot;, either episodic or otherwise.<p>But these are as we say mere matters of engineering.
评论 #42139929 未加载
评论 #42139463 未加载