TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Idea: Idiot Filter

5 点作者 atte超过 13 年前
I think an "idiot filter" for search results would save me a lot of time. Particularly when I'm digging through forums for information, I tend to skip over entries with poor grammar and spelling. Occasionally someone just speaks poor English, but more often this is an indicator that the submitter is unintelligent (or drunk) and their response will not be useful to me.<p>A simple idiot filter might just work as a layer above Google and snip out results with a high percentage of grammatical and spelling errors. A more refined one (probably a browser plugin) would act on content within pages and hide or dim unintelligible blocks. If the focus was on forums, I don't think it would be too hard to come up with an algorithm for guessing what encompasses a single user's submission by analyzing the structure of the page.<p>What do you guys think? Anyone want to work on it with me?

8 条评论

ChuckMcM超过 13 年前
Its an interesting concept, although the definition of 'idiot' is not very precise. As others have pointed out, sometimes brilliant people can't compose gramattically correct english. That being said ...<p>Its fairly easy to identify forums on the web (they have a form which is generally very common, inspired by PhPBB way back when). And you could identify users, take the sum of all their contributions and try to generate some sort of 'evolved' karma score for their posts. Things you might consider are things that academics use, how many times was the post referred to (similar to citations in papers), what sort of traffic follows the posting (similar to counterpoint papers), Etc. But even if you end up with a perfect score, you won't benefit until you've been able to process several postings. If poor quality posts are the norm in your particular research area you will still deal with a lot of junk while the algorithm is learning that it <i>is</i> junk.<p>Finding a way to predict that the posting is going to score high on the suppression scale as its being posted would be helpful but new posters appear quite rapidly mitigating the benefit significantly.
devs1010超过 13 年前
Hey, I'm working on an open source project that I think could application for this. Its something I've termed a "web gatherer" basically it provides the framework for crawling web pages and then has workflows where custom code is written to determine certain things about each page, if it meets criteria that is programmed for that workflow then the page is added to the results queue, the others are filtered out. I'm planning to implement an NLP component at some point using one of the open source NLP libs availabe. Overall, I think of this project as sort of a web scraper / search engine that sits above the base layer (such as Google) which can be used to refine results. Anyways, you may be interested, if so feel free to contact me: <a href="https://github.com/devs1010/WebGatherer---Scraper-and-Analyzer" rel="nofollow">https://github.com/devs1010/WebGatherer---Scraper-and-Analyz...</a>
dlitz超过 13 年前
You'd end up filtering out really good blogs like ERV, because its author objects to apostrophes and sometimes writes like a LOLcat: <a href="http://scienceblogs.com/erv/2009/12/drug_resistant_prions_via_quas.php" rel="nofollow">http://scienceblogs.com/erv/2009/12/drug_resistant_prions_vi...</a>
johnl超过 13 年前
I search forums for DIY home projects and have found I need at least 10 responses to my question before I can arrive at a result I feel comfortable with. Going back over the responses with the overview from the search, I can now understand responses that I originally thought were poor, weren't. I keep thinking something like a do-it-yourself thread builder that you build, save and share while you do your Google search, sort of a tumblr except you access multiple sites might be a better approach than an exclusion approach.
glimcat超过 13 年前
NLP is hard, particularly for highly general problems conducted on small samples of text.<p>Here's a problem case that you will find to be very common: a 20-second post with the right answer to a difficult problem by someone who's busy and typing on their phone. Riddled with typos, weird corrections, transposition errors, etc. - but still something you'd want to be a high-ranking result.
评论 #3302745 未加载
gujk超过 13 年前
Done.<p><a href="http://www.chrisfinke.com/addons/youtube-comment-snob/" rel="nofollow">http://www.chrisfinke.com/addons/youtube-comment-snob/</a>
meatsock超过 13 年前
this could be accomplished more simply by counting the number and size of the avatars on the forum in question.
评论 #3302813 未加载
mrkmcknz超过 13 年前
I know some highly intelligent people who are dyslexic. How would you tackle that?
评论 #3302811 未加载
评论 #3302326 未加载
评论 #3302324 未加载