TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How to use Machine Learning to extract facts from the text?

22 点作者 dartwing大约 8 年前
Machine Learning is evolving very quickly. What are the state of the art techniques to extract specific facts from the text automatically?<p>Are there any open source projects focused on this task that you could recommend?

6 条评论

grizzles大约 8 年前
Facts are simply assertions that have met some burden of proof. Determining that threshold is a subjective exercise, not an objective one. I know you want an algorithm to do this, but there is no sentient algorithm smart enough to do this. So, from an epistemological perspective, you are basically asking - what are the facts as determined by someone else?<p>The tragedy of subjectivity is, for most people, some random ranting into a youtube video for 15 minutes about eg. Hillary Clinton constitutes &quot;evidence&quot; sufficient to determine fact.
评论 #14184511 未加载
BjoernKW大约 8 年前
What exactly is a fact? There&#x27;s no easy answer to that question, particularly with natural rather than formal languages. &#x27;Facts&#x27; and statements depend on context. The meaning of a natural language statement usually is derived from these layers building on each other:<p>- syntax (the structure of a sentence)<p>- semantics (the isolated meaning of a sentence)<p>- pragmatics (the meaning of a sentence in context)<p>Anaphora (references to previous sentences or concepts) can be particularly nasty in this context.<p>Depending on the task at hand chunk parsing could be a good first take at finding relevant phrases from unstructured textual data. There are numerous libraries to accomplish that, for English and other Indo-European languages at least.
评论 #14182926 未加载
PaulHoule大约 8 年前
This system is a commercially oriented fact extraction system<p><a href="https:&#x2F;&#x2F;github.com&#x2F;machinalis&#x2F;iepy" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;machinalis&#x2F;iepy</a><p>that can be trained to get the kind of performance you would see in a text extractor customized by the likes of BBN or Booz Alan Hamilton. You need 20,000 training samples to start getting good results.
评论 #14192789 未加载
brad0大约 8 年前
How do you define a fact?<p>As far as I understand it symbolic AI back in the 80s was building a massive web of facts or &quot;truths&quot; that would be used to create a general AI. They eventually ended up generating a bunch of contradictions.
评论 #14179667 未加载
DrNuke大约 8 年前
Looking for something similar for .pdf academic papers in my field but nothing really useful to automatise the extraction process exists, so the best path is still to extract data manually, homogenise data in a standard protocol, fed ML algos. Once a data protocol becomes a widespread standard and maybe a ISO or similar, there is a chance automated extraction will work at the finest level, as necessary for complex information.
评论 #14182320 未加载
评论 #14179800 未加载
dartwing大约 8 年前
Looking at SyntaxNet from Google. If there are other candidates worth looking at - please kindly let me know.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;models&#x2F;tree&#x2F;master&#x2F;syntaxnet" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;models&#x2F;tree&#x2F;master&#x2F;syntaxnet</a>
评论 #14179509 未加载