When people think about using computers for Natural Language Processing, they often think about end-tasks like classification, translation, question answering, and models like BERT that model the statistical regularities in text. However, these tasks only measure indirectly how much the system has understood the meaning of the text, are largely unexplainable black boxes, and require reams of training data.<p>NLP is good enough that we can now explicitly measure how well a system reads text in terms of what knowledge is extracted from it. This task is called Knowledge Base Population, and we've released the first reproducible dataset called KnowledgeNet that measures this task, along with an open source state-of-the-art baseline.<p>Direct link to the Github repo: <a href="https://github.com/diffbot/knowledge-net" rel="nofollow">https://github.com/diffbot/knowledge-net</a>
EMNLP paper: <a href="https://www.aclweb.org/anthology/D19-1069.pdf" rel="nofollow">https://www.aclweb.org/anthology/D19-1069.pdf</a>