TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Idea: Idiot Filter

5 pointsby atteover 13 years ago
I think an "idiot filter" for search results would save me a lot of time. Particularly when I'm digging through forums for information, I tend to skip over entries with poor grammar and spelling. Occasionally someone just speaks poor English, but more often this is an indicator that the submitter is unintelligent (or drunk) and their response will not be useful to me.<p>A simple idiot filter might just work as a layer above Google and snip out results with a high percentage of grammatical and spelling errors. A more refined one (probably a browser plugin) would act on content within pages and hide or dim unintelligible blocks. If the focus was on forums, I don't think it would be too hard to come up with an algorithm for guessing what encompasses a single user's submission by analyzing the structure of the page.<p>What do you guys think? Anyone want to work on it with me?

8 comments

ChuckMcMover 13 years ago
Its an interesting concept, although the definition of 'idiot' is not very precise. As others have pointed out, sometimes brilliant people can't compose gramattically correct english. That being said ...<p>Its fairly easy to identify forums on the web (they have a form which is generally very common, inspired by PhPBB way back when). And you could identify users, take the sum of all their contributions and try to generate some sort of 'evolved' karma score for their posts. Things you might consider are things that academics use, how many times was the post referred to (similar to citations in papers), what sort of traffic follows the posting (similar to counterpoint papers), Etc. But even if you end up with a perfect score, you won't benefit until you've been able to process several postings. If poor quality posts are the norm in your particular research area you will still deal with a lot of junk while the algorithm is learning that it <i>is</i> junk.<p>Finding a way to predict that the posting is going to score high on the suppression scale as its being posted would be helpful but new posters appear quite rapidly mitigating the benefit significantly.
devs1010over 13 years ago
Hey, I'm working on an open source project that I think could application for this. Its something I've termed a "web gatherer" basically it provides the framework for crawling web pages and then has workflows where custom code is written to determine certain things about each page, if it meets criteria that is programmed for that workflow then the page is added to the results queue, the others are filtered out. I'm planning to implement an NLP component at some point using one of the open source NLP libs availabe. Overall, I think of this project as sort of a web scraper / search engine that sits above the base layer (such as Google) which can be used to refine results. Anyways, you may be interested, if so feel free to contact me: <a href="https://github.com/devs1010/WebGatherer---Scraper-and-Analyzer" rel="nofollow">https://github.com/devs1010/WebGatherer---Scraper-and-Analyz...</a>
dlitzover 13 years ago
You'd end up filtering out really good blogs like ERV, because its author objects to apostrophes and sometimes writes like a LOLcat: <a href="http://scienceblogs.com/erv/2009/12/drug_resistant_prions_via_quas.php" rel="nofollow">http://scienceblogs.com/erv/2009/12/drug_resistant_prions_vi...</a>
johnlover 13 years ago
I search forums for DIY home projects and have found I need at least 10 responses to my question before I can arrive at a result I feel comfortable with. Going back over the responses with the overview from the search, I can now understand responses that I originally thought were poor, weren't. I keep thinking something like a do-it-yourself thread builder that you build, save and share while you do your Google search, sort of a tumblr except you access multiple sites might be a better approach than an exclusion approach.
glimcatover 13 years ago
NLP is hard, particularly for highly general problems conducted on small samples of text.<p>Here's a problem case that you will find to be very common: a 20-second post with the right answer to a difficult problem by someone who's busy and typing on their phone. Riddled with typos, weird corrections, transposition errors, etc. - but still something you'd want to be a high-ranking result.
评论 #3302745 未加载
gujkover 13 years ago
Done.<p><a href="http://www.chrisfinke.com/addons/youtube-comment-snob/" rel="nofollow">http://www.chrisfinke.com/addons/youtube-comment-snob/</a>
meatsockover 13 years ago
this could be accomplished more simply by counting the number and size of the avatars on the forum in question.
评论 #3302813 未加载
mrkmcknzover 13 years ago
I know some highly intelligent people who are dyslexic. How would you tackle that?
评论 #3302811 未加载
评论 #3302326 未加载
评论 #3302324 未加载