TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Alibaba neural network defeats human in global reading test

170 pointsby ClintEhrlichover 7 years ago

13 comments

cs702over 7 years ago
This is not quite human-level question-answering in the everyday sense of those words. The ZDNet headline is too clickbaity for my taste.<p>The answer to every question in the test is a preexisting snippet of text, or &quot;span,&quot; from a corresponding reading passage shown to the model. The model has only to select which span in the reading passage gives the best answer -- i.e., which sequence of words already in the text best answers the question.[a]<p>Actual current results:<p><a href="https:&#x2F;&#x2F;rajpurkar.github.io&#x2F;SQuAD-explorer&#x2F;" rel="nofollow">https:&#x2F;&#x2F;rajpurkar.github.io&#x2F;SQuAD-explorer&#x2F;</a><p>Paper describing the dataset and test:<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1606.05250" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1606.05250</a><p>[a] If this explanation isn&#x27;t entirely clear to you, it might help to think of the problem as a challenging classification task in which the number of possible classes for each question is equal to the number of possible spans in the corresponding reading passage.
评论 #16152494 未加载
评论 #16155950 未加载
评论 #16155819 未加载
评论 #16157581 未加载
mark_l_watsonover 7 years ago
Great result. At my job I manage a machine learning team and so I am fairly much all-in for deep learning to solve practical problems.<p>That said, I think the path to &#x27;real&#x27; AGI lies in some combination of DL, probabilistic graph models, symbolic systems, and something we have not even imagined yet. BTW, a good paper just released on the limitations of DL by Judea Pearl <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1801.04016" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1801.04016</a>
评论 #16152627 未加载
评论 #16155030 未加载
评论 #16152638 未加载
Jachover 7 years ago
It would be interesting to know how well some of the entries on the Squad page do for the Winograd Schema challenge (<a href="https:&#x2F;&#x2F;cs.nyu.edu&#x2F;faculty&#x2F;davise&#x2F;papers&#x2F;WinogradSchemas&#x2F;WS.html" rel="nofollow">https:&#x2F;&#x2F;cs.nyu.edu&#x2F;faculty&#x2F;davise&#x2F;papers&#x2F;WinogradSchemas&#x2F;WS....</a>). Does anyone know if any of the systems have been tested on that as well?
cwyersover 7 years ago
I am always annoyed at claims in supervised learning that a machine predictor is better than humans. Humans obviously are the ones that scored the dataset to begin with. If you read the paper, it goes on to say, in regards to human evaluation:<p>&gt; Mismatch occurs mostly due to inclusion&#x2F;exclusion of non-essential phrases (e.g., monsoon trough versus movement of the monsoon trough) rather than fundamental disagreements about the answer.<p>I don&#x27;t think I would call that &quot;error,&quot; rather than ambiguity. In other words, there&#x27;s more than one possible answer to the questions under these criteria -- English isn&#x27;t a formal grammar where there&#x27;s always one and only one answer. For instance, here&#x27;s one of the questions from the ABC Wikipedia page:<p>&gt; What kind of network was ABC when it first began?<p>&gt; Ground Truth Answers: radio network radio radio network<p>&gt; Prediction: October 12, 1943<p>Because the second human said &quot;radio&quot; instead of &quot;radio network,&quot; I believe this would count as a human miss. But the answer is factually correct. Meanwhile, the prediction from the Stanford logistic regression (not the more sophisticated Alibaba model in the article, where I don&#x27;t think results are published at this detail) is completely wrong. No human could make that mistake. And yet these are treated as equally flawed answers by the EM metric.<p>And yet this gets headlined as &quot;defeats humans,&quot; not &quot;learns to mimic human responses well.&quot;
评论 #16157740 未加载
评论 #16156497 未加载
cscurmudgeonover 7 years ago
How well do these do on Winograd challenges?<p><a href="https:&#x2F;&#x2F;aaai.org&#x2F;Conferences&#x2F;AAAI-18&#x2F;aaai18winograd&#x2F;" rel="nofollow">https:&#x2F;&#x2F;aaai.org&#x2F;Conferences&#x2F;AAAI-18&#x2F;aaai18winograd&#x2F;</a>
评论 #16155935 未加载
pegasos1over 7 years ago
This is clickbait. Unless models are invariant to adversarial examples in SQuAD such as those described here: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1707.07328" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1707.07328</a>, models doing really well on SQuAD doesn&#x27;t mean a ton.
评论 #16154200 未加载
nlover 7 years ago
At NIPS 2017 there was a system which beat humans in a college QuizBowl competition. In many ways I think that was more impressive than excellent performance on SQuAD.
wanghqover 7 years ago
Kudos to my colleagues. The iDST team is based in Bellevue, WA and hiring more people. Let me know if you&#x27;re interested.<p>Also, the Alibaba Cloud is looking for engineers. Pls check <a href="https:&#x2F;&#x2F;careers.alibaba.com&#x2F;positionDetail.htm?positionId=b7kSeJ8J2XQ3ynkotvAhPw%3D%3D" rel="nofollow">https:&#x2F;&#x2F;careers.alibaba.com&#x2F;positionDetail.htm?positionId=b7...</a>
Xeoncrossover 7 years ago
@syllogism, have you thought about a demo combining spaCy + ____ to tackle SQuAD (<a href="https:&#x2F;&#x2F;rajpurkar.github.io&#x2F;SQuAD-explorer&#x2F;" rel="nofollow">https:&#x2F;&#x2F;rajpurkar.github.io&#x2F;SQuAD-explorer&#x2F;</a>)?
评论 #16153838 未加载
stablemapover 7 years ago
A counterpoint from Yoav Goldberg:<p><a href="http:&#x2F;&#x2F;u.cs.biu.ac.il&#x2F;~yogo&#x2F;squad-vs-human.pdf" rel="nofollow">http:&#x2F;&#x2F;u.cs.biu.ac.il&#x2F;~yogo&#x2F;squad-vs-human.pdf</a>
anorphirithover 7 years ago
is this still impressive in 2018? I honestly don&#x27;t know
评论 #16153772 未加载
spiderfarmerover 7 years ago
Cool. An AMP page. Makes it look like Google published this article.
评论 #16152323 未加载
评论 #16152215 未加载
mslaover 7 years ago
Real link:<p><a href="http:&#x2F;&#x2F;www.zdnet.com&#x2F;article&#x2F;alibaba-neural-network-defeats-human-in-global-reading-test&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.zdnet.com&#x2F;article&#x2F;alibaba-neural-network-defeats-...</a>
评论 #16152290 未加载