TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A Man Out to Prove How Dumb AI Still Is

64 点作者 fortran77大约 1 个月前

14 条评论

noosphr大约 1 个月前
&gt;Last week, the ARC Prize team released an updated test, called ARC-AGI-2, and it appears to have sent the AIs back to the drawing board. The full o3 model has not yet been tested, but a version of o1 dropped from 32 percent on the original puzzles to just 3 percent on the new version, and a “mini” version of o3 currently available to the public dropped from roughly 30 percent to below 2 percent. (An OpenAI spokesperson declined to say whether the company plans to run the benchmark with o3.) Other flagship models from OpenAI, Anthropic, and Google have achieved roughly 1 percent, if not lower. Human testers average about 60 percent.<p>Arc AGI is the main reason why I don&#x27;t trust static bench marks.<p>If you don&#x27;t have an essentially infinite set to draw your validation data from then a large enough model will memorize it as part of its developer teams KPIs.<p>Forget all these fancy benchmarks. If you want to saturate any model today give it a string and a grammar and ask it to generate the string from the grammar. I&#x27;ve had _every_ model fail this on regular grammars with strings of more than 4 characters long.<p>LLMs are the solution to natural language, which is a huge deal. They aren&#x27;t the solution to reasoning which is still best solved with what used to be called symbolic AI before it started working, e.g. sat solvers.
评论 #43588852 未加载
评论 #43588465 未加载
评论 #43588309 未加载
评论 #43588264 未加载
评论 #43588711 未加载
评论 #43588247 未加载
i_am_proteus大约 1 个月前
&gt;Chollet, a French computer scientist and one of the industry’s sharpest skeptics<p>I feel like this description really buries the lede on Chollet&#x27;s expertise. (For those who don&#x27;t know, he&#x27;s the creator of and lead contributor[0] to Keras)<p>[0]<a href="https:&#x2F;&#x2F;github.com&#x2F;keras-team&#x2F;keras&#x2F;graphs&#x2F;contributors">https:&#x2F;&#x2F;github.com&#x2F;keras-team&#x2F;keras&#x2F;graphs&#x2F;contributors</a>
mikestew大约 1 个月前
Not to dismiss Chollet’s work, but I’m starting to think he need prove nothing to even the muggles. For example, nearly any endurance athlete stands a good chance of being a Strava user. If you run in those circles, have you heard a <i>single</i> person with anything good to say about Strava’s “Athletic Intelligence”? Garmin is rolling out a beta right now that includes “AI Insights” or summat. Same deal: useless summaries like “you ran 5 miles today, which contributes to your aerobic base”. I could do better with a database and some <i>if&#x2F;else</i> statements. And Garmin wants a subscription for this. (It’s included in Strava’s subscription, but I suppose you’re still paying for it.) And so now the memes tend toward “dumb AI insight of the day” on many online forums.<p>Seems to me that a lot of folks are enjoying having an LLM rewrite their email or whatever, but I wonder how many are actually buying the rest of it? The companies themselves sure aren’t helping.
评论 #43588369 未加载
评论 #43589759 未加载
评论 #43589268 未加载
评论 #43588580 未加载
some_random大约 1 个月前
Calling Francois Chollet just &quot;A Man&quot; in the title (or &quot;The Man&quot; in the actual article as of writing) is crazy work, he&#x27;s been deeply involved in ML for ages including creating Keras.
whiplash451大约 1 个月前
Francois Chollet and his work deserve a better title than this stupid headline.<p>Francois is out to push the boundaries of science and help create models that are truly more intelligent.
mdp2021大约 1 个月前
&gt; <i>In 2019, Chollet created the Abstraction and Reasoning Corpus for Artificial General Intelligence, or ARC-AGI—an exam designed to show the gulf between AI models’ memorized answers and the “fluid intelligence” that people have</i><p>There are a number of skill signals we demand from an intelligence.<p>Mind you: some of them are achieved - like the ability to interpret pronouns (Hinton&#x27;s &quot;the trophy will not enter the case: it&#x27;s too big&quot; vs &quot;the trophy will not enter the case: it&#x27;s too small&quot;).<p>Others, we meet occasionally when we are not researching said requirements systematically: one example is that detective game described at <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43284420">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43284420</a> - a simple game of logic that intelligences are required to be able to solve (...and yet, again some rebutted that humans would fail etc.).<p>It remains important though that those working modules are not clustered (solving specific tasks and remaining unused otherwise): they must be intellectual keys adapted into use in the most general cases they can be be helpful in. That&#x27;s important in intelligence. So, even the ability to solve &quot;revealing&quot; tasks is not enough - the way in which the ability works is crucial.
andersco大约 1 个月前
<a href="https:&#x2F;&#x2F;archive.is&#x2F;7PL2a" rel="nofollow">https:&#x2F;&#x2F;archive.is&#x2F;7PL2a</a>
echelon大约 1 个月前
&gt; When I spoke with him earlier this year, Chollet told me that AI companies have long been “intellectually lazy“<p>s&#x2F;intellectually lazy&#x2F;hype maxing for fundraising&#x2F;
评论 #43588034 未加载
HenryBemis大约 1 个月前
<p><pre><code> 1a) it&#x27;s not AI, it&#x27;s LLM. The companies who create&#x2F;train&#x2F;operate them may (wink-wink) pitch them as &quot;AI&quot; with half-truths, but we (here) know it&#x27;s LLMs &quot;all the way down&quot; 1b) just like I disliked the &quot;autopilot&quot; in Teslas because it was never autopilot. 2) I know that I wanted to write some software tools, and I have been successful at this for the past many months, and I got top-shelve tools, that work, do their tasks, send alerts, etc. etc. And I am not the only one. So if the purpose is to &quot;show it&#x27;s a stupid AI&quot;.. well.. it&#x27;s not AI.. so yeah. If the purpose is &quot;it is not perfect&quot;, yes, because it draws a hand with 10 fingers. What else is new? </code></pre> LLMs are a tool, still under development, still early in the curve, they can do A-B-C well but not X-Y-Z well (or at all). Congratulations :)
评论 #43588819 未加载
sroussey大约 1 个月前
Maybe if AI knew what it was doing I would not end up banging my head like I did here:<p><a href="https:&#x2F;&#x2F;chatgpt.com&#x2F;share&#x2F;67ef43f4-3b88-800d-a5a3-e3ffea178fb3" rel="nofollow">https:&#x2F;&#x2F;chatgpt.com&#x2F;share&#x2F;67ef43f4-3b88-800d-a5a3-e3ffea178f...</a><p>(Me trying to describe a desk top with a fold down hinged top, and it just drawing whatever)
geor9e大约 1 个月前
I made a little viewer to see the dataset. Spoilers, it shows you the answers. It&#x27;s mainly to see the mistakes they&#x27;ve fixed on GitHub since it was released, and also to make proposing fixes easier.<p><a href="https:&#x2F;&#x2F;9eorge.com&#x2F;arc" rel="nofollow">https:&#x2F;&#x2F;9eorge.com&#x2F;arc</a><p>Supposedly, they validated it upon release by showing each task to at most nine people and only keeping the ones that at least two people got correct in two tries. But still, they have had to subsequently fix more than a dozen of them.
mdp2021大约 1 个月前
&gt; <i>A person who scores 30 percent on ARC-AGI-2 is in no sense inferior to someone who scores 90 percent</i><p>&quot;News just in: journalist for the Atlantic stops reasoning and drifts in a world of feelings after neural hijacking, as he perceives abilities as some kind of threat&quot;.<p>&gt; <i>Human cognitive diversity [...] when that diversity is already so abundant, do you really want to?</i><p>We definitely need intelligence.
Yeul大约 1 个月前
Unfortunately I played Deus Ex religiously when I was a kid so I&#x27;ll never be impressed with AI. The sequel had a self flying helicopter- good luck with that Tesla and BYD.
j_bum大约 1 个月前
&gt; To hit 87 percent on the original ARC-AGI test, o3 spent roughly 14 minutes per puzzle and, by my calculations, may have required hundreds of thousands of dollars in computing and electricity<p>&gt; the bot came up with more than 1,000 possible answers per grid before selecting a final submission.<p>Yeah, AGI is right around the corner… &#x2F;s
评论 #43588235 未加载
评论 #43588733 未加载