It wasn't long ago that the best attempt to pass the Turing-Test was a chatbot that "was portrayed as a 13-year-old Ukrainian boy to induce forgiveness in those with whom it interacts for its grammatical errors and lack of general knowledge." [0]. But then ChatGPT came along and blew the test out of the water.<p>There are other, more well defined intelligence tests such as the "Winograd Schema" [1] which asks grammatical questions that require real-world knowledge and common-sense in order to answer such as:<p>"In the sentence 'The trophy would not fit in the suitcase because it was too big/small.' what does big/small refer to?"<p>But even these type of questions which were considered to be "The Sentences Computers Can't Understand, But Humans Can" as late as 2020 [2] seem to be easy for LLMs to answer with GPT4 topping the list with almost human level accuracy [3].<p>So assuming ChatGPT isn't an AGI (yet), how can we prove it to ourselves? What linguistic tasks are humans qualitatively better than LLMs in?<p>------<p>[0] https://en.wikipedia.org/wiki/Eugene_Goostman
[1] https://en.wikipedia.org/wiki/Winograd_schema_challenge
[2] https://www.youtube.com/watch?v=m3vIEKWrP9Q
[3] https://paperswithcode.com/sota/common-sense-reasoning-on-winogrande
Does ChatGPT really blow away the Turing test?<p>The criteria listed on Wikipedia is:<p>> The evaluator would be aware that one of the two partners in conversation was a machine<p>> If the evaluator could not reliably tell the machine from the human, the machine would be said to have passed the test.<p>ChatGPT isn't there by a long shot. If you aren't really considering that an online agent may be a bot and are just casually interacting with it, then it might pass. But within 10 minutes, it is not going to become anywhere near as friendly as a person normally would in 10 minutes of conversation, especially if you know that one is a bot and are looking for differences.<p>Turing Test is not casually passing as human in casual interaction. It is stumping a human dedicated to rooting out that it is a machine who is aware that one of the participants is a machine.
Is it a fair test, if once it’s been passed, you change the testing criteria?
I’m genuinely asking because I would have thought that the test was a measure of something (in this case whether AI is capable of thinking like a human being) , so then to change what you’re measuring because something actually passed the test seems to defeat the point of the test somewhat.
Unless you want to redefine the variable being measured rather than the test?
I guess “thinking like a human being” is a latent variable and not easily observable so changing the way you define it is possible, but then you could just forever be redefining what it means to think like a human and constantly ensure that AI never quite makes the cut?<p>That’s my philosophical pondering over the issue…