The results are surprisingly good. I did some searches for recipes, and frankly without the top 1000 you really start getting some fresh hits. Entries by real people, rather than sites raking up recipes for a hit.
It's interesting to see popularity used as an inverse corollary with quality. Imagine a TV that skipped the most popular programming (goodbye American Idol), or a radio station that only plays non-hits.<p>Of course, there are great websites out there that are very popular (Wikipedia, NYTimes/WSJ, StackOverflow). I'd love to see a search engine with a better signal for quality than non-popularity (this search engine), or SEO (Google), but it's a fun start. :)
It means that our ranking algorithms have good recall but very poor precision. We value web page connectivity more than its content. We don't know how to teach machines to evaluate web page for its merit so we hope that a large number of Twitting, Liking and Plusing non-experts will approximate single expert. <i>Millionshort</i> shows that this model isn't good enough.
This is the first off-brand search engine I've seen that's, in some sense, cooler than Google.<p>For one thing, the huge miasma of spam websites that dominates the SERPs just isn't there -- I hope this lights a fire under Google's butt and people see another world is possible.
I'd be fascinated to see the kind of SEO that would go on if this took off.<p>"Bad news-- we're a top 100 hit for several of our main keywords. We'll have to change our URL scheme again."
This reminds of searching the internet in the 90's, I'm finding results from pages I haven't visited or heard of before now.<p>This is really refreshing.
This is a breath of fresh air - I'm loving the unpredictability of the top results! It's like flicking through a new set of 1000 tv channels in a different country.
Would you share some implementation details.<p>What's your source for the top million sites; where do you get your site list from for the other results?
Who needs to imagine it? It's here.<p>Cat is out of the bag.<p>What is the Alexa list good for? Answer: Filtering out the boring, money-grubbing commercial sites. A truly GREAT idea.<p>A return to the good 'ole days. The non-commercial web.<p>Many young people who love today's www never got to experience it as it was before it became overrun with Google-ization and auto-generated garbage.<p>Take the ball and run with it. We ca reclaim the web. This is only the beginning.
Turns out removing the top million results from a search for Google... still returns google. Or google.com.au to be precise.<p>It's a cool idea, but I'm not sure it's working. I tried "american history" but it wouldn't return anything at all if I changed the "Remove the Top" dropdown.
There's one pretty big flaw with this approach... For certain searches that do not have "millions of results", you get completely unrelated results.<p>If I search my name then the results are for names similar to mine, but not actually my name. This makes it completely useless for searching my name. I would think that there are many searches with this problem.<p>I think there needs to be some kind of weighting system used that dynamically decides the cutoff point. One million is a huge over-generalization for all search terms.
I like this idea a lot. I came across a nice, concise explanation of a Buffer overflow<p><a href="http://www.apolis.org/index.php?option=com_content&view=article&id=81:what-is-a-buffer-overflow&catid=44:faq&Itemid=62" rel="nofollow">http://www.apolis.org/index.php?option=com_content&view=...</a>
It's a similar measure that's often used in NLP. Sentences, documents etc. are usually stripped of common or popular terms first and the remaining ones tend to have higher information value.<p>It's not entirely a surprise that it works for meta-language constructs like the web and site popularity.
The results feel actually fresh. It's removing the consumerism layer of bullshit that google serves us everyday.
Also wondering if that has something to do with the "bubble" that google creates around us based on our search history and social network information.<p>Thanks for that.
Searching for 'python global interpreter lock' yields some interest blog articles describing the problems, also some related articles about approaches to the C10k problem with python (preforking, worker processes, etc.)<p>A++ would search again.
I'm really liking this. Instead of being bombarded with content that's just blasted with keywords, I get relevant well-written articles. Not only that, but no more W3Schools in my SERP's. The chance to read an article that's written with humans in mind, instead of Google is more than enough reason to spend some more time using this.
You think that top results in Google and other commercial search engines are always ranked based on "popularity"?<p>It would be harsh to call this naive, but it shows a serious lack of SEM and SEO knowledge. Ever heard of "paid placement"?<p>Many years ago when Digital's AltaVista was our main search engine, it was becoming loaded down with paid placement.<p>The results were polluted.<p>Google eventually became the "clean" solution.<p>But now it's Google that is loaded down with all sorts of commercial crud, much of pointing to Google acquisitions.<p>And paid placement, among numerous other strategies, new and old, still exists.<p>The simplicity of millionshort is brilliant.<p>Filter out the crap.
Add a way for me to put this as my search engine in my firefox search bar.<p>Please.<p>EDIT: In trying to accomplish this task I found an add-on that lets you do this for anything.<p>(<a href="https://addons.mozilla.org/en-US/firefox/addon/add-to-search-bar/" rel="nofollow">https://addons.mozilla.org/en-US/firefox/addon/add-to-search...</a>)
Here's what Dropbox thinks about power users:<p>WSJ: What's next?<p>Mr. Ferdowsi: We continue to focus on actually solving problems that
real people have and not being distracted by what power users want.<p>Google has made clear what they think about power users:<p>No + operators in search.<p>No web-based code search.<p>No Google Labs for the public.<p>etc.<p>Plenty of wood behind the Google arrows, but all the cool ones have been cast out of the quiver.<p>Just what kind of targets is Google aiming at nowadays?<p>Millionshort I give you +999,999.<p>I would give you +1M if you took out the AdSense and PlusOne javascript.<p>This has been a long time coming.<p>Alas, DDG and other alternatives are all about _money_.<p>Search is about _discovery_.
Well that's one way to break out of the filter-bubble/echo-chamber I suppose. If only our best search technology was based on something better than a popularity contest :(
And of course, W3Schools still manages to show up, thanks to their multiple crazy subdomains: <a href="http://cl.ly/GFup" rel="nofollow">http://cl.ly/GFup</a>
After a few test searches, this is surprisingly effective for things which I had resigned "un-findable" because of poor Google results. This is most apparent on non-technical things, in this case specific Jazz chord fingerings for a guitar class I am taking.<p>I am very interested as to what comes of this, or rather what is influenced by its implications.
What is the ranking used for the top million sites? A search result for "Australia" returns as the top result <a href="http://australia.gov.au" rel="nofollow">http://australia.gov.au</a>, which Alexa ranks as 20,615 globally. Actually, a lot of the queries I tried returned Australian sites.<p><a href="http://millionshort.com/search.php?q=australia&remove=1000k" rel="nofollow">http://millionshort.com/search.php?q=australia&remove=10...</a><p><a href="http://millionshort.com/search.php?q=somalia&remove=1000k" rel="nofollow">http://millionshort.com/search.php?q=somalia&remove=1000...</a> -- another Australian site.
It's amusing to see all the SEO "experts" that don't make it into the top million:<p><a href="http://millionshort.com/search.php?q=seo&remove=1000k" rel="nofollow">http://millionshort.com/search.php?q=seo&remove=1000k</a>
I would prefer if my previous search was populated in the search box after completing a search (since I might want to try the search with a different filter).<p>It appears you have an "off by one issue" in the sidebar. There's always a blank entry in the list of ignored sites.<p>Filtering does not seem to be working (or I don't understand it). Searching on "chicken" produced the same results with 1million or 100k removed.
Thankyou! I can see this being something I regularly use.<p>It may be a simple idea, but its something nobody else has done before, and I think the creators deserve a lot of credit for coming up with and implementing it. I hope they manage to get something from it. I can see that if the site becomes popular it will just get copied by other search sites.
"Quality" is subjective.<p>More relevant is _accuracy_, i.e., you get what you specify via search operators, and results are not influenced by all of Google's silly "factors". You know what you're looking for and how to frame the query. But Google assumes you're dumb and thinks it should decide for you.<p>Alexa Top 1M is a nice filter because the data comes from the Alexa Toolbar which only the most braindead web users would have installed. So you are in effect avoiding sites that the web's most braindead users would often visit.<p>Ranking sites based on "popularity" is great until you reach the point where the majority of users are not very intelligent. (cf. search engine users in 2004 with search engine users today.) When you reach that point, you get results where "quality" is determined by idiots (and SEO hats), not a group of intelligent peers.
Exactly what I was thinking in 2001 => <a href="http://www.halfbakery.com/idea/The_20Other_20search_20engine" rel="nofollow">http://www.halfbakery.com/idea/The_20Other_20search_20engine</a><p>Glad to see someone did it now...
This is a great idea and I see myself coming back to this. It's a shame that a little blog on tumblr or blogspot gets taken out because it's under a big name domain - but this has spam related benefits too.<p>Great work!
Wow some of the content there is great. Forget about searching the deep web. For me deep is the real gems buried under the first 100 or so results where stuff actually gets interesting!
Only a little thing, could do with maintaining query strings between pages. It lost my query string and returned no results when I changed the drop down without me noticing.
I just tried it with a search for some competitive intelligence. I used the 100K removal option. I found a competitor in another country that had not made the top 2 pages on Google. It confirms that others are launching something similar to what I am building... but also the fact that it doesn't bubble to the top on Google means that the market space is not dominated yet.
I love this! And am totally going to use it. Removing the top "thousand sites" removes pretty much all the sites I WISH I could have filtered from my Google results anyhow (ehow, w3schools, etc).<p>One request: please keep the search text in the form field after clicking "search". Just so users can search the same thing multiple times with different values in the "Remove the top" drop-down.
Not sure if I found an anomaly or what, but a simple search of "Privacy" returns results from thesaurus.com, merriam-webster.com, truste.com, kelloggcompany.com, and many more that are all in the top few thousand according to QuantCast and Compete.<p>Great idea though, will definitely try this out some more.
Great idea, although I think if you could explain it a bit better you could avoid the confusion like several of these comments are showing. I like how my Hacker Newsletter project shows up #2 when searching for Hacker News. :)
I think remove results with my search term in the domain name and this would be perfect!<p>For example I searched for how to start a garden and I can guarantee that startagarden.com is junk. But indie see some useful advice from small blogs etc
I'm getting odd results with the following query :<p>Search String : Ruby<p>Remove From Top : 1000 & 10000<p>In both instances, the top hit is <a href="http://www.ruby-lang.org" rel="nofollow">http://www.ruby-lang.org</a>, which is also the top hit from both Google and DDG.<p>Am I missing something?<p>edit: formatting
This is a cool search engine for discovery, but it defeats the purpose when you are looking for a location. Do a search for "facebook", you will not get any result that links you to facebook.com .
I would like to see the search engine adhere more strictly to quoted search terms. It seems that they are partially ignored, which gives it some of the same problems that the major engines have.
Good idea. It's about time that search engines route around the power-law distributions of popular sites, popular bloggers and personalities to find the gems otherwise buried in the noise.
> Imagine a search engine that simply removed the top 1 million most popular web sites from its index. What would you discover?<p>A lot of my competitors who are still on the first page of Google results.
Wow that is pretty awesome. I reached some results I want that I could not find via popular search engines with hours of searching. Believe it or not, this engine is changing my life.
Very interesting. I was pleased with the results and have already added this site to my Chrome bookmark bar, right between my Google search and Hacker News icons.
i searched for my site and in teh goog i get first page....here i found nothing....so for me this = no good. I understand the base but i dont understand the result
I searched for "Hero Academy" and the first result was Google's 5th result, a site called Hero Academy with the url "hero-academy.com". That's not very "million short", IMHO.