Wow, is this really state of the art?<p><pre><code> Joe did not buy a car today.
He was in buying mood.
But all cars were too expensive.
Why didn't Joe buy a car?
Answer: buying mood
</code></pre>
I think I have seen similar systems for decades now. I thought we would be further along meanwhile.<p>I have tried for 10 or 20 minutes now. But I can't find any evidence that it has much sense of syntax:<p><pre><code> Paul gives a coin to Joe.
Who received a coin?
Answer: Paul
</code></pre>
All it seems to do is to extract candidates for "who", "what", "where" etc. So it seems to figure out correctly that "Paul" is a potential answer for "Who".<p>No matter how I rephrase the "Who" question, I always get "Paul" as the answer. "Who? Paul!", "Who is a martian? Paul!", "Who won the summer olympics? Paul", "Who got a coin from the other guy? Paul!"<p>Same for "what" questions:<p><pre><code> Gold can not be carried in a bag. Silver can.
What can be carried in a bag?
Answer: Gold</code></pre>
This is very brittle: it works really well on the pre-canned examples but the vocabulary seems very tightly linked. It doesn't handle something as simple as:<p>'the patient had no pain but did have nausea'<p>Doesn't yield any helpful on semantic role labeling and didn't even parse on machine comprehension. If I vary it to say ask 'did the patient have pain?' the answer is 'nausea'.<p>CoreNLP provides much more useful analysis of the phrase structure and dependencies.
In "Adversarial Examples for Evaluating Reading Comprehension Systems" <a href="https://arxiv.org/abs/1707.07328" rel="nofollow">https://arxiv.org/abs/1707.07328</a>, it was found that adding a single distracting sentence can lower F1 score of BiDAF (which is used in demo here) from 75.5% to 34.3% on SQuAD. In comparison, human performance goes from 92.6% to 89.2%.