A search engine built by the crowd

43 pointsby brequinnalmost 12 years ago

18 comments

Great idea, but I think Google already uses this data.Google Toolbar and now Chrome report this data back to Google, and most search pros believe "serp bounce back" and "time on site" are key signals Google uses.PageRank and DwellRank are not either-or choices.Here's my theory: Google uses PageRank to decide what pages to "try out" for a query (i.e. display a page in the SERP for a sampling of queries). If the page gets clicks AND has good "DwellRank" then it gets progressively better and better rankings. If a new page enters that beats it, it falls.This approach is very Googly -- they love to test. They love to decide if product features are good or not by giving them a sampling of traffic. It would be insane of them not to extend this approach to search.So the upshot is, use "PageRank" to decide which pages deserve an audition, and use "DwellRank" to decide the winners.Since 40% of the clicks go to #1, 10% to #2, 8% to #3 etc,Google can audition pages using DwellRank without affecting the experience of the majority of their users.

gabemartalmost 12 years ago

I was a little surprised that they haven't included anything about spam or gamification. One core advantage of pagerank is that it's (relatively) hard to get links from high-authority websites. I can't force whithouse.gov or cnn.com to link to me. If you rely on time-spent-on-page from millions of users and treat everyone equally, how to you stop spammers from faking millions of hours spent reading their content using spoofing or bots?

评论 #6026816 未加载

评论 #6027105 未加载

评论 #6027122 未加载

mcintyre1994almost 12 years ago

Hmm, this is an interesting algorithm, but I'd challenge its major assumption for a lot of searches. I don't have metrics, so of course my own assumptions can be challenged also, feel free to.I think that a lot of search engine enquiries are essentially questions, with an answer that can be considered correct. Absolutely not all, but I think enough that they should certainly be considered. In that case, a site which immediately and clearly answers the question should be given, I want my answer within seconds, not minutes. If you give me the site that answers my question and that users spend the most time on, that's the exact opposite of what I want in this case.Here's an example, I search "Population of America", your site's top result is sporcle.com, a quiz site. I bet people spend ages on there guessing the population of various countries etc, but I'd prefer to just get my answer.That said, it appears such queries are handled outside the main algorithm by your competitors. Both Google and DuckDuckGo will give a card, at the top of the result, answering my query - I don't even have to visit a website.I guess the tl;dr is that it's awesome that this is ambitious, but I challenge the assumption your algorithm is desirable for the majority of search results. Neither is Google's really though, so maybe this is an overly harsh criticism of something Google probably did very poorly early on too.

评论 #6027135 未加载

drKarlalmost 12 years ago

This algorightm might highly impact discoverability. It gives mover visibility to already popular websites making them even popular, while not very well known websites will never be discovered because very few people spend time on them.Also I don't like the idea of having to install a plugin on my browser so that the urls I visit and how much time I spent on them is tracked, even if suposedly my identity is never tracked. Once the plugin is installed how can I know if a new version of the plugin won't track more parameters?When I read the title I though it was referring to a distributed search engine like YacY or Seeks.

评论 #6026824 未加载

robrenaudalmost 12 years ago

+1 for being super ambitious.Full disclosure: I work at Google (though not on web search).Your search actually sucks, perhaps because your index is woefully inadequate. How many pages are in it? Maybe you should use common crawl?

评论 #6026871 未加载

a3_nmalmost 12 years ago

It's sad to think that, with Google Analytics, Google probably has this data point already available for a lot of pages without having to ask people to install stuff.

评论 #6026932 未加载

dwgalmost 12 years ago

1. PRISM? 2. Often best sites are the ones I spend the least amount of time on—because I got an answer quickly. Would hate to not be able to find those site. Seems link the traditional form of ranking should still be an important part of your solution.

评论 #6026906 未加载

c54almost 12 years ago

Google is a good name for a search engine, and easily usable as a verb.Yahoo is pretty good, and was used as a verb for all of the 90s and early 2000s. IMO not as good as "I'm going to google that", "I'm going to yahoo that" sounds vaguely sexual.DuckDuckGo and AskJeeves are terribad: "I'm going to duck duck go that"? No.Blippex is better than ddg or ask jeeves, but still not too great. Coming up with a good product name is hard but is crucial for usability / spread through culture. Reminds me of blumpkin.

评论 #6026891 未加载

评论 #6027240 未加载

评论 #6027033 未加载

josephpmayalmost 12 years ago

I think this is a great idea! However, I am worried about privacy. I also feel like this algorithm may inflate the importance of certain types of content over others. For example, just because I spend more time on a news or social media website does not mean that it has higher quality content, it just means that the content takes longer to consume. Within content categories, however, I think this could do a good job of weeding out the quality content from the spam.

评论 #6027230 未加载

trickjarrettalmost 12 years ago

I had a similar idea in college which was to take the actual traffic for pages into account for search ranking (this was before Google bought whatever Analytics had been called before, I can't remember.) I had thought of it as a server side app which would benefit the hosts while feeding the search engine traffic data.After talking with friends we explored the idea of a user side traffic tracking app as a way to feed the search engine, but I couldn't get enough traction and no one wanted to challenge not only Google but also IE/Firefox/Safari etc. because we felt it would be its own browser.Alas.Now a days I am more concerned about possible privacy issues, I feel for them launching a search engine that actively asks you to be tracked (even if anonymously), it's a hard sell during this current resistance to that entire idea.

评论 #6029213 未加载

评论 #6026851 未加载

Shankalmost 12 years ago

It needs some sort of fallback for search results or it's useless to a specialized user. My Google search history looks like random bits of consciousness spread out across months. Half of those search terms bring 0 results on Blippex, and while I understand that they're early, it's hard to beat something like PageRank when it's already got established experience.It's a catch 22: the results won't get better unless people use the service, but people aren't going to use it if the results are bad in the first place. If I install the extension but use Google, it's a one-way relationship that only they get data out of. Not very good for me.

DavidWanjirualmost 12 years ago

How do you differentiate between useful dwell and useless dwell? I often need to spend some time on a page before I realize this is not what I'm looking for. How will you tell? And now that we're talking about search, I had an experience on google that I found very odd. I was looking for the Richard Marx song, "Suspicion" from the album "My Own Best Enemy". I knew the song and the album, but I couldn't remember the name Richard Marx. Problem was instead of typing "My Own Best Enemy", I was typing "My Own WORST Enemy". Google had no clue. Shouldn't a good search engine be able to tell it's just one word wrong?

评论 #6027736 未加载

jspaetzelalmost 12 years ago

The problem with this is that it's basically asking to be manipulated.

评论 #6027216 未加载

alooPotatoalmost 12 years ago

Thanks for building this. We need more stuff like this.Out of curiousity, how do you prevent the case of some random malicious user impersonating your chrome extension and just issueing a bunch of "dwells" to your server. I.e. can I just curl what this javascript file (<a href="https://github.com/blippex/blippex_plugin_chrome/blob/master/plugin/common/js/api/upload.js" rel="nofollow">https://github.com/blippex/blippex_plugin_chrome/blob/master...</a>) is requesting to boost my own pages ranking?

评论 #6028752 未加载

vmarsyalmost 12 years ago

Interesting idea, but I tried simple searches :Facebookgmailnews ycombinatorcountries in europe wikiDid you gather enough data already ?All of these seraches were not successful. There was no Facebook link in the first search, no Gmail link in the second one , no news.ycombinator in the 3rd one, and the only wikipedia link I got in the last search was :<a href="http://en.wikipedia.org/wiki/National_champions" rel="nofollow">http://en.wikipedia.org/wiki/National_champions</a>

评论 #6027259 未加载

lucb1ealmost 12 years ago

I'm sorry for the offtopic, but on a page that's supposed to get people involved, shouldn't you at least mind the difference between its and it's? In the very first paragraph it goes wrong already. I'm not a native English speaker, but these mistakes always jump out for some reason.

评论 #6026783 未加载

gavinpcalmost 12 years ago

Doesn't work at all without cookies. Meaning, it doesn't work, and doesn't tell you why. If you're targeting people who are looking for an alternative to the major search engines, there's a better-than-average chance that they'll have cookies disabled.

评论 #6026800 未加载

undef1nedalmost 12 years ago

It's some kind of AI or even some kind of neural network, people are involved to train the search engine, so, more data users will contribute to the search server - more proper and relevant results they will get. Good idea