TechEcho

13 comments

cs702over 7 years ago

This is not quite human-level question-answering in the everyday sense of those words. The ZDNet headline is too clickbaity for my taste.The answer to every question in the test is a preexisting snippet of text, or "span," from a corresponding reading passage shown to the model. The model has only to select which span in the reading passage gives the best answer -- i.e., which sequence of words already in the text best answers the question.[a]Actual current results:<a href="https://rajpurkar.github.io/SQuAD-explorer/" rel="nofollow">https://rajpurkar.github.io/SQuAD-explorer/</a>Paper describing the dataset and test:<a href="https://arxiv.org/abs/1606.05250" rel="nofollow">https://arxiv.org/abs/1606.05250</a>[a] If this explanation isn't entirely clear to you, it might help to think of the problem as a challenging classification task in which the number of possible classes for each question is equal to the number of possible spans in the corresponding reading passage.

评论 #16152494 未加载

评论 #16155950 未加载

评论 #16155819 未加载

评论 #16157581 未加载

mark_l_watsonover 7 years ago

Great result. At my job I manage a machine learning team and so I am fairly much all-in for deep learning to solve practical problems.That said, I think the path to 'real' AGI lies in some combination of DL, probabilistic graph models, symbolic systems, and something we have not even imagined yet. BTW, a good paper just released on the limitations of DL by Judea Pearl <a href="https://arxiv.org/abs/1801.04016" rel="nofollow">https://arxiv.org/abs/1801.04016</a>

评论 #16152627 未加载

评论 #16155030 未加载

评论 #16152638 未加载

Jachover 7 years ago

It would be interesting to know how well some of the entries on the Squad page do for the Winograd Schema challenge (<a href="https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.html" rel="nofollow">https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS....</a>). Does anyone know if any of the systems have been tested on that as well?

cwyersover 7 years ago

I am always annoyed at claims in supervised learning that a machine predictor is better than humans. Humans obviously are the ones that scored the dataset to begin with. If you read the paper, it goes on to say, in regards to human evaluation:> Mismatch occurs mostly due to inclusion/exclusion of non-essential phrases (e.g., monsoon trough versus movement of the monsoon trough) rather than fundamental disagreements about the answer.I don't think I would call that "error," rather than ambiguity. In other words, there's more than one possible answer to the questions under these criteria -- English isn't a formal grammar where there's always one and only one answer. For instance, here's one of the questions from the ABC Wikipedia page:> What kind of network was ABC when it first began?> Ground Truth Answers: radio network radio radio network> Prediction: October 12, 1943Because the second human said "radio" instead of "radio network," I believe this would count as a human miss. But the answer is factually correct. Meanwhile, the prediction from the Stanford logistic regression (not the more sophisticated Alibaba model in the article, where I don't think results are published at this detail) is completely wrong. No human could make that mistake. And yet these are treated as equally flawed answers by the EM metric.And yet this gets headlined as "defeats humans," not "learns to mimic human responses well."

评论 #16157740 未加载

评论 #16156497 未加载

cscurmudgeonover 7 years ago

How well do these do on Winograd challenges?<a href="https://aaai.org/Conferences/AAAI-18/aaai18winograd/" rel="nofollow">https://aaai.org/Conferences/AAAI-18/aaai18winograd/</a>

评论 #16155935 未加载

pegasos1over 7 years ago

This is clickbait. Unless models are invariant to adversarial examples in SQuAD such as those described here: <a href="https://arxiv.org/abs/1707.07328" rel="nofollow">https://arxiv.org/abs/1707.07328</a>, models doing really well on SQuAD doesn't mean a ton.

评论 #16154200 未加载

nlover 7 years ago

At NIPS 2017 there was a system which beat humans in a college QuizBowl competition. In many ways I think that was more impressive than excellent performance on SQuAD.

wanghqover 7 years ago

Kudos to my colleagues. The iDST team is based in Bellevue, WA and hiring more people. Let me know if you're interested.Also, the Alibaba Cloud is looking for engineers. Pls check <a href="https://careers.alibaba.com/positionDetail.htm?positionId=b7kSeJ8J2XQ3ynkotvAhPw%3D%3D" rel="nofollow">https://careers.alibaba.com/positionDetail.htm?positionId=b7...</a>

Xeoncrossover 7 years ago

@syllogism, have you thought about a demo combining spaCy + ____ to tackle SQuAD (<a href="https://rajpurkar.github.io/SQuAD-explorer/" rel="nofollow">https://rajpurkar.github.io/SQuAD-explorer/</a>)?

评论 #16153838 未加载

stablemapover 7 years ago

A counterpoint from Yoav Goldberg:<a href="http://u.cs.biu.ac.il/~yogo/squad-vs-human.pdf" rel="nofollow">http://u.cs.biu.ac.il/~yogo/squad-vs-human.pdf</a>

anorphirithover 7 years ago

is this still impressive in 2018? I honestly don't know

评论 #16153772 未加载

spiderfarmerover 7 years ago

Cool. An AMP page. Makes it look like Google published this article.

评论 #16152323 未加载

评论 #16152215 未加载

mslaover 7 years ago

Real link:<a href="http://www.zdnet.com/article/alibaba-neural-network-defeats-human-in-global-reading-test/" rel="nofollow">http://www.zdnet.com/article/alibaba-neural-network-defeats-...</a>

评论 #16152290 未加载

13 comments

cs702over 7 years ago

评论 #16152494 未加载

评论 #16155950 未加载

评论 #16155819 未加载

评论 #16157581 未加载

mark_l_watsonover 7 years ago

评论 #16152627 未加载

评论 #16155030 未加载

评论 #16152638 未加载

Jachover 7 years ago

cwyersover 7 years ago

评论 #16157740 未加载

评论 #16156497 未加载

cscurmudgeonover 7 years ago

How well do these do on Winograd challenges?<a href="https://aaai.org/Conferences/AAAI-18/aaai18winograd/" rel="nofollow">https://aaai.org/Conferences/AAAI-18/aaai18winograd/</a>

评论 #16155935 未加载

pegasos1over 7 years ago

评论 #16154200 未加载

nlover 7 years ago

At NIPS 2017 there was a system which beat humans in a college QuizBowl competition. In many ways I think that was more impressive than excellent performance on SQuAD.

wanghqover 7 years ago

Xeoncrossover 7 years ago

评论 #16153838 未加载

stablemapover 7 years ago

A counterpoint from Yoav Goldberg:<a href="http://u.cs.biu.ac.il/~yogo/squad-vs-human.pdf" rel="nofollow">http://u.cs.biu.ac.il/~yogo/squad-vs-human.pdf</a>

anorphirithover 7 years ago

is this still impressive in 2018? I honestly don't know

评论 #16153772 未加载

spiderfarmerover 7 years ago

Cool. An AMP page. Makes it look like Google published this article.

评论 #16152323 未加载

评论 #16152215 未加载

mslaover 7 years ago

Real link:<a href="http://www.zdnet.com/article/alibaba-neural-network-defeats-human-in-global-reading-test/" rel="nofollow">http://www.zdnet.com/article/alibaba-neural-network-defeats-...</a>

评论 #16152290 未加载

Alibaba neural network defeats human in global reading test

13 comments

Alibaba neural network defeats human in global reading test

13 comments