TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Testing AI on language comprehension reveals insensitivity to underlying meaning

2 点作者 PunchTornado6 个月前

1 comment

philipswood6 个月前
&gt; _...in this work we tested 7 state-of-the-art LLMs on simple comprehension questions targeting short sentences, purposefully setting an extremely low bar for the evaluation of the models. Systematic testing showed that the performance of these LLMs lags behind that of humans both quantitatively and qualitatively, providing further confirmation that tasks that are easy for humans are not always easily developed in AI. We argue that these results invite further reflection about the standards of evaluation we adopt for claiming human-likeness in AI._<p>An interesting paper suggesting that while LLMs produce seemingly human-like output, they use very un-human-like computational approaches with significant weaknesses.<p>Suggesting that, while they are not stochastic parrots, - they&#x27;re nowhere near human level intelligence yet.<p>Tasks like:<p>&gt; John deceived Mary and Lucy was deceived by Mary. In this context, did Mary deceive Lucy?<p>&gt; Franck read to himself and John read to himself, Anthony and Franck. In this context, was Franck read to?