Seems to be overfitting to statistical regularities in the dataset, or in any
case it completely ignores the facts and rules you give it and draws the
answer from who knows where:<p><pre><code> Metals ermuf electricity.
Insulators do not ermuf electricity.
If something is made of gudranga then it is metal.
Nails are made of gudranga.
Nails conduct electricity.
ROVER prediction:
Nails conduct electricity. True (confidence = 0.99)
</code></pre>
Yes, it can tell that nails ermuf electircity:<p><pre><code> ROVER prediction:
Nails ermuf electricity. True (confidence = 0.99)
</code></pre>
However, it also thinks that nails gudranga electricity:<p><pre><code> ROVER prediction:
Nails gudranga electricity. True (confidence = 0.99)
</code></pre>
So in short, it is very determined to find that Nails Y electricity, for
whatever Y, whether Y is something that relates nails to electricty or not.<p><a href="https://rule-reasoning.apps.allenai.org/?p=Metals%20ermuf%20electricity.%20%0AInsulators%20do%20not%20ermuf%20electricity.%20%0AIf%20something%20is%20made%20of%20gudranga%20then%20it%20is%20metal.%20%0ANails%20are%20made%20of%20gudranga.&q=Nails%20ermuf%20electricity" rel="nofollow">https://rule-reasoning.apps.allenai.org/?p=Metals%20ermuf%20...</a>.
Here's a link to the paper: <a href="https://arxiv.org/abs/2002.05867" rel="nofollow">https://arxiv.org/abs/2002.05867</a><p>Shortened Abstract: "AI has long pursued the goal of having systems reason over <i>explicitly provided</i> knowledge, but building suitable representations has proved challenging. Here we explore whether transformers can similarly learn to reason (or emulate reasoning), but using rules expressed in language, thus bypassing a formal representation. We provide the first demonstration that this is possible, and characterize the extent of this capability. To do this, we use a collection of synthetic datasets that test increasing levels of reasoning complexity (number of rules, presence of negation, and depth of chaining). We find transformers appear to learn rule-based reasoning with high (99%) accuracy on these datasets, and in a way that generalizes to test data requiring substantially deeper chaining than in the training data (95%+ scores). We also demonstrate that the models transfer well to two hand-authored rulebases, and to rulebases paraphrased into more natural language. "<p>The performance numbers are pretty impressive IMO. But learning from synthetic datasets is pretty perilous, hard to say if it'll generalize well. Kudos to them for putting out a live demo.
Just tried a set of classic non-monotonic reasoning statements and it didn't like it much: <a href="https://rule-reasoning.apps.allenai.org/?p=Penguins%20are%20birds%0ABirds%20can%20typically%20fly%0APenguins%20cannot%20fly%0ATweety%20is%20a%20bird&q=Can%20tweety%20fly%3F" rel="nofollow">https://rule-reasoning.apps.allenai.org/?p=Penguins%20are%20...</a>
It doesn't seem very precise. For example, it doesn't seem to distinguish between "Is" and "Is like":<p><a href="https://rule-reasoning.apps.allenai.org/?p=A%20pear%20is%20a%20type%20of%20fruit.%0AA%20pear%20is%20like%20an%20apple.&q=An%20apple%20is%20a%20fruit.%0AAn%20apple%20is%20a%20pear.%0AA%20pear%20is%20an%20apple.%0AA%20fruit%20is%20a%20pear" rel="nofollow">https://rule-reasoning.apps.allenai.org/?p=A%20pear%20is%20a...</a>.<p>Edit: Added more test cases