TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: We need a new Turing-Test

5 点作者 air7超过 1 年前
It wasn&#x27;t long ago that the best attempt to pass the Turing-Test was a chatbot that &quot;was portrayed as a 13-year-old Ukrainian boy to induce forgiveness in those with whom it interacts for its grammatical errors and lack of general knowledge.&quot; [0]. But then ChatGPT came along and blew the test out of the water.<p>There are other, more well defined intelligence tests such as the &quot;Winograd Schema&quot; [1] which asks grammatical questions that require real-world knowledge and common-sense in order to answer such as:<p>&quot;In the sentence &#x27;The trophy would not fit in the suitcase because it was too big&#x2F;small.&#x27; what does big&#x2F;small refer to?&quot;<p>But even these type of questions which were considered to be &quot;The Sentences Computers Can&#x27;t Understand, But Humans Can&quot; as late as 2020 [2] seem to be easy for LLMs to answer with GPT4 topping the list with almost human level accuracy [3].<p>So assuming ChatGPT isn&#x27;t an AGI (yet), how can we prove it to ourselves? What linguistic tasks are humans qualitatively better than LLMs in?<p>------<p>[0] https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Eugene_Goostman [1] https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Winograd_schema_challenge [2] https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=m3vIEKWrP9Q [3] https:&#x2F;&#x2F;paperswithcode.com&#x2F;sota&#x2F;common-sense-reasoning-on-winogrande

2 条评论

MattGaiser超过 1 年前
Does ChatGPT really blow away the Turing test?<p>The criteria listed on Wikipedia is:<p>&gt; The evaluator would be aware that one of the two partners in conversation was a machine<p>&gt; If the evaluator could not reliably tell the machine from the human, the machine would be said to have passed the test.<p>ChatGPT isn&#x27;t there by a long shot. If you aren&#x27;t really considering that an online agent may be a bot and are just casually interacting with it, then it might pass. But within 10 minutes, it is not going to become anywhere near as friendly as a person normally would in 10 minutes of conversation, especially if you know that one is a bot and are looking for differences.<p>Turing Test is not casually passing as human in casual interaction. It is stumping a human dedicated to rooting out that it is a machine who is aware that one of the participants is a machine.
评论 #38055065 未加载
Quinzel超过 1 年前
Is it a fair test, if once it’s been passed, you change the testing criteria? I’m genuinely asking because I would have thought that the test was a measure of something (in this case whether AI is capable of thinking like a human being) , so then to change what you’re measuring because something actually passed the test seems to defeat the point of the test somewhat. Unless you want to redefine the variable being measured rather than the test? I guess “thinking like a human being” is a latent variable and not easily observable so changing the way you define it is possible, but then you could just forever be redefining what it means to think like a human and constantly ensure that AI never quite makes the cut?<p>That’s my philosophical pondering over the issue…