科技回声

6 条评论

grizzles大约 8 年前

Facts are simply assertions that have met some burden of proof. Determining that threshold is a subjective exercise, not an objective one. I know you want an algorithm to do this, but there is no sentient algorithm smart enough to do this. So, from an epistemological perspective, you are basically asking - what are the facts as determined by someone else?The tragedy of subjectivity is, for most people, some random ranting into a youtube video for 15 minutes about eg. Hillary Clinton constitutes "evidence" sufficient to determine fact.

评论 #14184511 未加载

BjoernKW大约 8 年前

What exactly is a fact? There's no easy answer to that question, particularly with natural rather than formal languages. 'Facts' and statements depend on context. The meaning of a natural language statement usually is derived from these layers building on each other:- syntax (the structure of a sentence)- semantics (the isolated meaning of a sentence)- pragmatics (the meaning of a sentence in context)Anaphora (references to previous sentences or concepts) can be particularly nasty in this context.Depending on the task at hand chunk parsing could be a good first take at finding relevant phrases from unstructured textual data. There are numerous libraries to accomplish that, for English and other Indo-European languages at least.

评论 #14182926 未加载

PaulHoule大约 8 年前

This system is a commercially oriented fact extraction system<a href="https://github.com/machinalis/iepy" rel="nofollow">https://github.com/machinalis/iepy</a>that can be trained to get the kind of performance you would see in a text extractor customized by the likes of BBN or Booz Alan Hamilton. You need 20,000 training samples to start getting good results.

评论 #14192789 未加载

brad0大约 8 年前

How do you define a fact?As far as I understand it symbolic AI back in the 80s was building a massive web of facts or "truths" that would be used to create a general AI. They eventually ended up generating a bunch of contradictions.

评论 #14179667 未加载

DrNuke大约 8 年前

Looking for something similar for .pdf academic papers in my field but nothing really useful to automatise the extraction process exists, so the best path is still to extract data manually, homogenise data in a standard protocol, fed ML algos. Once a data protocol becomes a widespread standard and maybe a ISO or similar, there is a chance automated extraction will work at the finest level, as necessary for complex information.

评论 #14182320 未加载

评论 #14179800 未加载

dartwing大约 8 年前

Looking at SyntaxNet from Google. If there are other candidates worth looking at - please kindly let me know.<a href="https://github.com/tensorflow/models/tree/master/syntaxnet" rel="nofollow">https://github.com/tensorflow/models/tree/master/syntaxnet</a>

Ask HN: How to use Machine Learning to extract facts from the text?

6 条评论

Ask HN: How to use Machine Learning to extract facts from the text?

6 条评论