TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: How to use Machine Learning to extract facts from the text?

22 pointsby dartwingabout 8 years ago
Machine Learning is evolving very quickly. What are the state of the art techniques to extract specific facts from the text automatically?<p>Are there any open source projects focused on this task that you could recommend?

6 comments

grizzlesabout 8 years ago
Facts are simply assertions that have met some burden of proof. Determining that threshold is a subjective exercise, not an objective one. I know you want an algorithm to do this, but there is no sentient algorithm smart enough to do this. So, from an epistemological perspective, you are basically asking - what are the facts as determined by someone else?<p>The tragedy of subjectivity is, for most people, some random ranting into a youtube video for 15 minutes about eg. Hillary Clinton constitutes &quot;evidence&quot; sufficient to determine fact.
评论 #14184511 未加载
BjoernKWabout 8 years ago
What exactly is a fact? There&#x27;s no easy answer to that question, particularly with natural rather than formal languages. &#x27;Facts&#x27; and statements depend on context. The meaning of a natural language statement usually is derived from these layers building on each other:<p>- syntax (the structure of a sentence)<p>- semantics (the isolated meaning of a sentence)<p>- pragmatics (the meaning of a sentence in context)<p>Anaphora (references to previous sentences or concepts) can be particularly nasty in this context.<p>Depending on the task at hand chunk parsing could be a good first take at finding relevant phrases from unstructured textual data. There are numerous libraries to accomplish that, for English and other Indo-European languages at least.
评论 #14182926 未加载
PaulHouleabout 8 years ago
This system is a commercially oriented fact extraction system<p><a href="https:&#x2F;&#x2F;github.com&#x2F;machinalis&#x2F;iepy" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;machinalis&#x2F;iepy</a><p>that can be trained to get the kind of performance you would see in a text extractor customized by the likes of BBN or Booz Alan Hamilton. You need 20,000 training samples to start getting good results.
评论 #14192789 未加载
brad0about 8 years ago
How do you define a fact?<p>As far as I understand it symbolic AI back in the 80s was building a massive web of facts or &quot;truths&quot; that would be used to create a general AI. They eventually ended up generating a bunch of contradictions.
评论 #14179667 未加载
DrNukeabout 8 years ago
Looking for something similar for .pdf academic papers in my field but nothing really useful to automatise the extraction process exists, so the best path is still to extract data manually, homogenise data in a standard protocol, fed ML algos. Once a data protocol becomes a widespread standard and maybe a ISO or similar, there is a chance automated extraction will work at the finest level, as necessary for complex information.
评论 #14182320 未加载
评论 #14179800 未加载
dartwingabout 8 years ago
Looking at SyntaxNet from Google. If there are other candidates worth looking at - please kindly let me know.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;models&#x2F;tree&#x2F;master&#x2F;syntaxnet" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;models&#x2F;tree&#x2F;master&#x2F;syntaxnet</a>
评论 #14179509 未加载