TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

An open-source filter software that can detect rampant stupidity in written English

41 pointsby jakewolfabout 17 years ago

12 comments

pgabout 17 years ago
I've often thought about doing something like this for comments. I think it would work.<p>The hard part is getting the initial corpora of stupid and non-stupid text. Stupid writing is harder to recognize than spam. It might work to use sites as proxies.<p>Another related filter that might be worth trying to build would be one for recognizing trolls. It would be easy to collect the bad corpus for this filter, because the design of most forums makes it easy to see all the comments by a particular user, e.g.<p><a href="http://reddit.com/user/qwe1234/" rel="nofollow">http://reddit.com/user/qwe1234/</a>
评论 #145948 未加载
评论 #146041 未加载
评论 #145950 未加载
评论 #145942 未加载
评论 #146631 未加载
评论 #146316 未加载
评论 #146506 未加载
LogicHoleFlawabout 17 years ago
The XKCD folks implemented what they call "Robot9000". Robot9000 attempts to ensure that every comment being added to a site or chat channel is unique when compared against the history of the channel. It basically hashes a somewhat stripped-down version of each comment and compares it against the entire historical corpus of their chat. If the comment is found then the user is muted for an exponentially-increasing amount of time for each infraction. I believe there's a slow decay on the mute duration as well. This sort of filter won't stop stupid text, but it seems to be working for them. It's a novel approach to the problem of signal-dilution as a social network grows.<p>Robot9000 release announcement: <a href="http://blag.xkcd.com/2008/01/14/robot9000-and-xkcd-signal-attacking-noise-in-chat/" rel="nofollow">http://blag.xkcd.com/2008/01/14/robot9000-and-xkcd-signal-at...</a><p>Perl source: <a href="http://media.peeron.com/tmp/ROBOT9000.html" rel="nofollow">http://media.peeron.com/tmp/ROBOT9000.html</a>
ambitionabout 17 years ago
I'd be more interested in the complement -- filters tuned to pick up smart or interesting writing. I'm not convinced that it's necessarily an identical problem.<p>It would be neat to run a battery of standard semantic analysis tools against the text of web pages ranked highly on HN, compared with pages not ranked highly.
评论 #146224 未加载
TrevorJabout 17 years ago
//<i>USER COMMENT REDACTED BY STUPIDITY FILTER</i>//
aykallabout 17 years ago
Thats a really hard task to accomplish. Is it poor english stupid? What about foreigners writing in english? Are they all stupid, they won't write 100% proper english. What about misspelled words?<p>I think irrelevancy and inaccuracy are the best way to distinguish stupid from smart and the key to know what is one what is the other is probably on the subject of the comments and that would be a related/non-related filter not a stupid filter.<p>Honestly, by the name it got I think it is more intended to get a lot of buzz than to really become a real product. Isn't Mr. Ortiz just trying to get some attention? The definition of stupid is directly related to the reader so you can't have a filter for that, it would have to be personal.
评论 #146636 未加载
jakewolfabout 17 years ago
From WSJ blog <a href="http://blogs.wsj.com/buzzwatch/2008/03/24/idea-watch-can-this-man-banish-stupidity-from-the-internet/?mod=WSJBlog?mod=homeblogmod_buzzwatch" rel="nofollow">http://blogs.wsj.com/buzzwatch/2008/03/24/idea-watch-can-thi...</a>
评论 #146235 未加载
petercooperabout 17 years ago
It is an easy mistake to believe that tools as simple as Bayesian filters can emulate intelligence. It requires intelligence to determine whether someone else is intelligent or not, not a bunch of rules and filters.<p>As we've seen with spam, any unintelligent system can be circumvented given enough time and ingenuity. Bayesian filters are now but a small part of e-mail analysis.
评论 #146925 未加载
earleabout 17 years ago
&#62;&#62;&#62; An open-source filter software that can detect rampant stupidity in written English<p>Text is not likely to be stupid.<p>CLASSIFY succeeds; success probability: 0.5043 pR: 0.0075 Best match to file #0 (/home/sfp/code/nonstupid_cor.css) prob: 0.5043 pR: 0.0075
edw519about 17 years ago
Input: "If I had 6 hours to chop down a tree, I'd spend the first 4 sharpening the axe."<p>Output: "Text is not likely to be stupid."<p>Input: "You wanna see my pics?"<p>Output: "Text is likely to be stupid."
评论 #145916 未加载
评论 #146456 未加载
Prrometheusabout 17 years ago
Reddit should use a filter like this that a comment must pass before it is allowed to be posted (except for on the lolcat and NSFW subreddits).
mtwabout 17 years ago
not scalable, look at their "stupid" and "non-stupid" data
xlntabout 17 years ago
They give an example of filtering out lowercase text. Lowercase and stupid are very different. I sometimes write whole essays in lowercase.
评论 #146595 未加载