TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How I trained fake news detection AI with 95% accuracy, and almost went crazy

30 pointsby thetall0neover 7 years ago

10 comments

wadkarover 7 years ago
The fakebox doesn’t detect fake news, it detects articles which are factual&#x2F;real and everything else is labeled as “fake”.<p>Where’s the dataset? How did you verify the ground truth? Where are the annotation&#x2F;labeling guidelines?<p>What’s the definition of factual&#x2F;real articles? The dataset appears to be created by the author - which isn’t necessarily wrong but to paraphrase Karl Popper (in the context of human knowledge and scientific endeavors):<p>There are no ‘pure’ facts available; all observations are functions of subjective factors such as interests, expectations, wishes etc.<p><a href="http:&#x2F;&#x2F;plato.stanford.edu&#x2F;entries&#x2F;popper&#x2F;#GrowHumaKnow" rel="nofollow">http:&#x2F;&#x2F;plato.stanford.edu&#x2F;entries&#x2F;popper&#x2F;#GrowHumaKnow</a>
评论 #16128827 未加载
rakerover 7 years ago
This article is the 5%.<p>A more accurate way of detecting &quot;fake news&quot; would be interesting, but I fail to see how such a thing could be designed, past simple detection of wishy-washy and avoidant word patterns.
bagrowover 7 years ago
Accuracy is not a sufficient measure of a classifier. Better to report precision and recall, or any number of other combination measures.<p><a href="https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Evaluation_of_binary_classifiers" rel="nofollow">https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Evaluation_of_binary_classif...</a>
minimaxirover 7 years ago
The OP does not say the label distribution of the training data; it&#x27;s entirely likely that the split is not balanced 50&#x2F;50, which would make &quot;95% accuracy&quot; as an indicator of quality misleading.<p>This is one of the reasons why I recommend that Medium thought pieces disclose their data and code instead of just saying &quot;I did AI magic!&quot; to sell a product (and they do charge for their product on their website).
richdoughertyover 7 years ago
&gt; I found myself drifting in my own interpretation of fake news, getting angry as I came across articles that I didn’t agree with, fighting hard against the urge to only pick ones I thought were right. What was right or wrong anyway?<p>A good question and I&#x27;m not surprised he went a bit crazy.<p><a href="https:&#x2F;&#x2F;plato.stanford.edu&#x2F;entries&#x2F;truth&#x2F;" rel="nofollow">https:&#x2F;&#x2F;plato.stanford.edu&#x2F;entries&#x2F;truth&#x2F;</a><p>&gt; The problem of truth is in a way easy to state: what truths are, and what (if anything) makes them true. But this simple statement masks a great deal of controversy. Whether there is a metaphysical problem of truth at all, and if there is, what kind of theory might address it, are all standing issues in the theory of truth. We will see a number of distinct ways of answering these questions.
评论 #16128844 未加载
thetall0neover 7 years ago
The model is not based on domains. Just the text of the article. Can confirm there was an even number of real and notreal news examples. Data set was eventually broken into two categories; written with bias, or without bias. For example, a NYT Opinion piece was considered notreal news.
txshover 7 years ago
He’s not detecting fake news. He’s detecting articles that match the writing style of a handful of publications and labeling everything else “fake”.
评论 #16128835 未加载
peterwwillisover 7 years ago
What the....?<p>The author describes a &quot;fake news detector AI&quot;, that is actually a &quot;typically legitimate source of news&quot; data model, combined with a fake news domain blacklist. It doesn&#x27;t detect fake news. It detects whether a story possibly came from a source you find to typically be legitimate.<p>This article is fake news.
评论 #16128696 未加载
评论 #16128703 未加载
评论 #16133177 未加载
评论 #16128754 未加载
tantalorover 7 years ago
Where&#x27;s the demo?
评论 #16128795 未加载
mirekrusinover 7 years ago
He needs to release&#x2F;train at least 3 versions with whitelist-blacklist variations for rt, al jazeera and fox news.