TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Blekko already has a list of domains to block from results

46 点作者 aj700超过 14 年前

18 条评论

aaronbrethorst超过 14 年前
Wow, I'm seeing tons of false positives. Why on earth is the co-written blog of a nobel laureate and a 7th circuit appeals court judge making blekko's bayesian classifier freak out?<p>#88 - <a href="http://www.becker-posner-blog.com" rel="nofollow">http://www.becker-posner-blog.com</a> - bayes (spam 31.6 &#62; 5.3)
评论 #2079410 未加载
评论 #2079053 未加载
评论 #2080221 未加载
robinduckett超过 14 年前
I'm seeing:<p><a href="http://www.deadmau5.com" rel="nofollow">http://www.deadmau5.com</a> -&#62; Really, spam?<p><a href="http://www.comparethemarket.com" rel="nofollow">http://www.comparethemarket.com</a> -&#62; Price comparison site with no known biases in the UK <i></i>and<i></i> not owned by British Telecommunications, as it is listed.<p>Even ONE false positive is enough to make me think this listing is a load of bullshit. How is geocities' "closing down" page spam?
评论 #2079273 未加载
chrismealy超过 14 年前
They're not passing the "best refrigerator" test yet:<p><a href="http://blekko.com/ws/best+refrigerator" rel="nofollow">http://blekko.com/ws/best+refrigerator</a>
评论 #2078840 未加载
评论 #2080099 未加载
benbeltran超过 14 年前
Two big problems with automated spam blocking are: false positives and changing domain names.<p>For the second one, how often do you revise your blocked links? what if it changed owner and the new one doesn't provide spam.<p>For the first one, is even one false positive tolerable? Will you deny someone presence in your index because you failed? And if so, how do you handle challenges?
评论 #2081129 未加载
zitterbewegung超过 14 年前
Why is <a href="http://www.deadmau5.com/" rel="nofollow">http://www.deadmau5.com/</a> marked as spam?? Its obviously not.
评论 #2078886 未加载
greglindahl超过 14 年前
* This list is generated by our algorithm, it's the most important sites which our algorithm thinks are spam. The point of making the list public is so that you guys can tell us when we're wrong. Google has a list like this, but they don't show it to anyone. Transparency in action.<p>Thank you all for pointing out false positives in the list. That is what we hoped would happen.<p>* The "nocrawl" sites are human-picked by us. Geocities is on the list because it was a very spammy domain. Even though they've (finally) removed the data, we still have old data indexed, and will remove them from the spam list once all that old data ages out.<p>* BT is the hosting company for comparethemarket.com
YooLi超过 14 年前
There are quite a few false positives on that list. Also titling sites with nocrawl as spam is pretty lame. dshield.org is listed as mfa, but there isn't an adsense ad on it.
wybo超过 14 年前
I have been using Blekko as my primary search-engine for a couple of days now, and in my experience their search-results are very decent.<p>Not, maybe in terms of falsely blocked sites, but certainly in terms of having fewer false positives (e.g. spam/useless pages) in the search results.<p>Mom &#38; pop users (and even more advanced searchers, such as students looking for book-reviews, or torrents) might very well forgive them the few false blocks for this.<p>Zittrain in his 'The Future of the Internet and How to Stop It' already wrote about this trade-off in terms of spam being made possible by the generativity of the internet, and people increasingly preferring controlled environments over those full of virusses and spam (wonder why apple's locked down devices are so popular?).<p>Of course this has big downsides too, and even is bad in my opinion. But Blekko, by allowing people to create their own slashtags (categories, much more flexible and quick than Googles domain search) and google/yahoo/bing always being only one click away, might have arrived at a good middle-ground...<p>Imho Blekko might very well be able to beat Google at their own game. Give them a try, or at least sometimes when google doesn't do it for you, I'd say...
评论 #2079658 未加载
Misha_B超过 14 年前
In spite of the (justified) complains you get about the false positives, I think that's a great way to go. Unlike with email where missing a message might be critical, in search I'd rather have even as much as 10-20% false positives than deal with the spam sites Google delivers.<p>More in general, concerning the front page search examples: "cure for headaches" works very well indeed compared to google. However, "global warming /liberal" is a bit irritating. I understand the rationale behind it, however there is this slight difference between finding only what one is looking for and hearing only what one wants to hear. To find anything non-mainstream might necessitate a technique like this in Google where you otherwise don't see anything else in the first 50 results... But maybe you can strive to find for me what's really going on and not merely what's mainstream and politically correct. Thinking about it, your blocking of domains like Answer.com might be a great step in that direction anyway.
评论 #2079468 未加载
cubicle67超过 14 年前
can anyone explain what bayes and mfa mean? I picked the site <a href="http://www.basemetals.com/" rel="nofollow">http://www.basemetals.com/</a> (bayes (spam 8.6 &#62; 5.3)) at random, and although it won't win any design awards I can't see what the problem with it is. Am I missing something?
评论 #2078879 未加载
评论 #2078873 未加载
aj700超过 14 年前
Omitting these domains from results is "automated".<p><i>Managing the list isn't</i>. It's based partly on how many users report a domain as being spam. At least that's one of the reasons for inclusion. And don't bayesian filters, with little data to work with and if newly implemented always have false positives?<p>Maybe some are labelling valid stuff as spam out of spite.<p>When blekko has millions of users labelling stuff as spam instead of very few, the system will be harder to abuse and the list much better.
zitterbewegung超过 14 年前
From all the comments and what I have noticed it sounds like a good question is "Is it better to have false positives or false negatives" in the spam problem. I personally think that its better to have false negatives then positives and a lot of the comments here seem to reflect that.
评论 #2081123 未加载
rumpelstiltskin超过 14 年前
If they put johnchow.com on the list, they must be doing something right.
评论 #2080036 未加载
veb超过 14 年前
Wouldn't it be better to let people do their own blacklists, and then incorporate that into their official list <i>if</i> a percentage of people have that site down as spam?
评论 #2079150 未加载
评论 #2080118 未加载
viraptor超过 14 年前
Doesn't include swik.net which is a crap link aggregation / search tag spam - I'd expect that one to be removed...
kilian超过 14 年前
I wonder why they have both the www and the non-www version in there for domains?
评论 #2079486 未加载
coolswan超过 14 年前
nice. if I had to guess, in a couple years, google will attempt to acquire blekko to integrate with their webspam team.
Rubyred超过 14 年前
Wow, the search results are terrible on blekko. I think someone's gone crazy with the ban hammer.<p>My suggestion to blekko: look for signals of relevance to determine serps, instead of flagging every other website as spam.