TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Imagine a search engine that removed top million sites from its index

552 pointsby taxonomymanabout 13 years ago

80 comments

swalshabout 13 years ago
The results are surprisingly good. I did some searches for recipes, and frankly without the top 1000 you really start getting some fresh hits. Entries by real people, rather than sites raking up recipes for a hit.
评论 #3911418 未加载
评论 #3910929 未加载
评论 #3910910 未加载
nostromoabout 13 years ago
It's interesting to see popularity used as an inverse corollary with quality. Imagine a TV that skipped the most popular programming (goodbye American Idol), or a radio station that only plays non-hits.<p>Of course, there are great websites out there that are very popular (Wikipedia, NYTimes/WSJ, StackOverflow). I'd love to see a search engine with a better signal for quality than non-popularity (this search engine), or SEO (Google), but it's a fun start. :)
评论 #3911511 未加载
评论 #3912066 未加载
评论 #3913154 未加载
评论 #3912938 未加载
评论 #3912985 未加载
评论 #3914012 未加载
评论 #3911833 未加载
评论 #3911747 未加载
zeratulabout 13 years ago
It means that our ranking algorithms have good recall but very poor precision. We value web page connectivity more than its content. We don't know how to teach machines to evaluate web page for its merit so we hope that a large number of Twitting, Liking and Plusing non-experts will approximate single expert. <i>Millionshort</i> shows that this model isn't good enough.
评论 #3911233 未加载
评论 #3910956 未加载
评论 #3911104 未加载
PaulHouleabout 13 years ago
This is the first off-brand search engine I've seen that's, in some sense, cooler than Google.<p>For one thing, the huge miasma of spam websites that dominates the SERPs just isn't there -- I hope this lights a fire under Google's butt and people see another world is possible.
评论 #3911303 未加载
评论 #3911302 未加载
Cushmanabout 13 years ago
I'd be fascinated to see the kind of SEO that would go on if this took off.<p>"Bad news-- we're a top 100 hit for several of our main keywords. We'll have to change our URL scheme again."
评论 #3911176 未加载
评论 #3910694 未加载
libraryatnightabout 13 years ago
This reminds of searching the internet in the 90's, I'm finding results from pages I haven't visited or heard of before now.<p>This is really refreshing.
eliasmacphersonabout 13 years ago
This is a breath of fresh air - I'm loving the unpredictability of the top results! It's like flicking through a new set of 1000 tv channels in a different country.
评论 #3910782 未加载
pbhjpbhjabout 13 years ago
Would you share some implementation details.<p>What's your source for the top million sites; where do you get your site list from for the other results?
goodgraciousabout 13 years ago
Who needs to imagine it? It's here.<p>Cat is out of the bag.<p>What is the Alexa list good for? Answer: Filtering out the boring, money-grubbing commercial sites. A truly GREAT idea.<p>A return to the good 'ole days. The non-commercial web.<p>Many young people who love today's www never got to experience it as it was before it became overrun with Google-ization and auto-generated garbage.<p>Take the ball and run with it. We ca reclaim the web. This is only the beginning.
waivejabout 13 years ago
Wow... It felt like using Google 10 years ago. I think you are onto something.
评论 #3910685 未加载
bicknergsengabout 13 years ago
Turns out removing the top million results from a search for Google... still returns google. Or google.com.au to be precise.<p>It's a cool idea, but I'm not sure it's working. I tried "american history" but it wouldn't return anything at all if I changed the "Remove the Top" dropdown.
评论 #3910603 未加载
bstar77about 13 years ago
There's one pretty big flaw with this approach... For certain searches that do not have "millions of results", you get completely unrelated results.<p>If I search my name then the results are for names similar to mine, but not actually my name. This makes it completely useless for searching my name. I would think that there are many searches with this problem.<p>I think there needs to be some kind of weighting system used that dynamically decides the cutoff point. One million is a huge over-generalization for all search terms.
评论 #3911434 未加载
RegExabout 13 years ago
I like this idea a lot. I came across a nice, concise explanation of a Buffer overflow<p><a href="http://www.apolis.org/index.php?option=com_content&#38;view=article&#38;id=81:what-is-a-buffer-overflow&#38;catid=44:faq&#38;Itemid=62" rel="nofollow">http://www.apolis.org/index.php?option=com_content&#38;view=...</a>
评论 #3910878 未加载
评论 #3910591 未加载
baneabout 13 years ago
It's a similar measure that's often used in NLP. Sentences, documents etc. are usually stripped of common or popular terms first and the remaining ones tend to have higher information value.<p>It's not entirely a surprise that it works for meta-language constructs like the web and site popularity.
评论 #3911886 未加载
serbrechabout 13 years ago
The results feel actually fresh. It's removing the consumerism layer of bullshit that google serves us everyday. Also wondering if that has something to do with the "bubble" that google creates around us based on our search history and social network information.<p>Thanks for that.
hafabnewabout 13 years ago
Searching for 'python global interpreter lock' yields some interest blog articles describing the problems, also some related articles about approaches to the C10k problem with python (preforking, worker processes, etc.)<p>A++ would search again.
heydonovanabout 13 years ago
I'm really liking this. Instead of being bombarded with content that's just blasted with keywords, I get relevant well-written articles. Not only that, but no more W3Schools in my SERP's. The chance to read an article that's written with humans in mind, instead of Google is more than enough reason to spend some more time using this.
halle_lu_jahabout 13 years ago
You think that top results in Google and other commercial search engines are always ranked based on "popularity"?<p>It would be harsh to call this naive, but it shows a serious lack of SEM and SEO knowledge. Ever heard of "paid placement"?<p>Many years ago when Digital's AltaVista was our main search engine, it was becoming loaded down with paid placement.<p>The results were polluted.<p>Google eventually became the "clean" solution.<p>But now it's Google that is loaded down with all sorts of commercial crud, much of pointing to Google acquisitions.<p>And paid placement, among numerous other strategies, new and old, still exists.<p>The simplicity of millionshort is brilliant.<p>Filter out the crap.
unimpressiveabout 13 years ago
Add a way for me to put this as my search engine in my firefox search bar.<p>Please.<p>EDIT: In trying to accomplish this task I found an add-on that lets you do this for anything.<p>(<a href="https://addons.mozilla.org/en-US/firefox/addon/add-to-search-bar/" rel="nofollow">https://addons.mozilla.org/en-US/firefox/addon/add-to-search...</a>)
评论 #3912409 未加载
halle_lu_jahabout 13 years ago
Here's what Dropbox thinks about power users:<p>WSJ: What's next?<p>Mr. Ferdowsi: We continue to focus on actually solving problems that real people have and not being distracted by what power users want.<p>Google has made clear what they think about power users:<p>No + operators in search.<p>No web-based code search.<p>No Google Labs for the public.<p>etc.<p>Plenty of wood behind the Google arrows, but all the cool ones have been cast out of the quiver.<p>Just what kind of targets is Google aiming at nowadays?<p>Millionshort I give you +999,999.<p>I would give you +1M if you took out the AdSense and PlusOne javascript.<p>This has been a long time coming.<p>Alas, DDG and other alternatives are all about _money_.<p>Search is about _discovery_.
__alexsabout 13 years ago
Well that's one way to break out of the filter-bubble/echo-chamber I suppose. If only our best search technology was based on something better than a popularity contest :(
评论 #3910898 未加载
tnorthcuttabout 13 years ago
And of course, W3Schools still manages to show up, thanks to their multiple crazy subdomains: <a href="http://cl.ly/GFup" rel="nofollow">http://cl.ly/GFup</a>
评论 #3910773 未加载
评论 #3913927 未加载
yayadarshabout 13 years ago
After a few test searches, this is surprisingly effective for things which I had resigned "un-findable" because of poor Google results. This is most apparent on non-technical things, in this case specific Jazz chord fingerings for a guitar class I am taking.<p>I am very interested as to what comes of this, or rather what is influenced by its implications.
评论 #3912729 未加载
评论 #3910825 未加载
erichoceanabout 13 years ago
Man, I love this thing. I've already found a bunch of interesting links on path tracing. Bookmark'd.
评论 #3911190 未加载
wonderwhyabout 13 years ago
What is the ranking used for the top million sites? A search result for "Australia" returns as the top result <a href="http://australia.gov.au" rel="nofollow">http://australia.gov.au</a>, which Alexa ranks as 20,615 globally. Actually, a lot of the queries I tried returned Australian sites.<p><a href="http://millionshort.com/search.php?q=australia&#38;remove=1000k" rel="nofollow">http://millionshort.com/search.php?q=australia&#38;remove=10...</a><p><a href="http://millionshort.com/search.php?q=somalia&#38;remove=1000k" rel="nofollow">http://millionshort.com/search.php?q=somalia&#38;remove=1000...</a> -- another Australian site.
评论 #3910748 未加载
bo1024about 13 years ago
I'd really like to see randomization instead. Return results picked randomly from within the top 10 million or something.
评论 #3910886 未加载
评论 #3910661 未加载
评论 #3911499 未加载
scootabout 13 years ago
It's amusing to see all the SEO "experts" that don't make it into the top million:<p><a href="http://millionshort.com/search.php?q=seo&#38;remove=1000k" rel="nofollow">http://millionshort.com/search.php?q=seo&#38;remove=1000k</a>
评论 #3911827 未加载
wazooxabout 13 years ago
A real serendipity engine. Absolutely great, thank you. I'm finding tons of interesting products and ideas by searching the most banal things :)
pfarrellabout 13 years ago
I would prefer if my previous search was populated in the search box after completing a search (since I might want to try the search with a different filter).<p>It appears you have an "off by one issue" in the sidebar. There's always a blank entry in the list of ignored sites.<p>Filtering does not seem to be working (or I don't understand it). Searching on "chicken" produced the same results with 1million or 100k removed.
评论 #3910952 未加载
twelvechairsabout 13 years ago
Thankyou! I can see this being something I regularly use.<p>It may be a simple idea, but its something nobody else has done before, and I think the creators deserve a lot of credit for coming up with and implementing it. I hope they manage to get something from it. I can see that if the site becomes popular it will just get copied by other search sites.
pessimizerabout 13 years ago
This is pretty amazing. I didn't know that the old internet was still there! This may become my new favorite search engine.
halle_lu_jahabout 13 years ago
"Quality" is subjective.<p>More relevant is _accuracy_, i.e., you get what you specify via search operators, and results are not influenced by all of Google's silly "factors". You know what you're looking for and how to frame the query. But Google assumes you're dumb and thinks it should decide for you.<p>Alexa Top 1M is a nice filter because the data comes from the Alexa Toolbar which only the most braindead web users would have installed. So you are in effect avoiding sites that the web's most braindead users would often visit.<p>Ranking sites based on "popularity" is great until you reach the point where the majority of users are not very intelligent. (cf. search engine users in 2004 with search engine users today.) When you reach that point, you get results where "quality" is determined by idiots (and SEO hats), not a group of intelligent peers.
charlieokabout 13 years ago
It's like a hipster search engine. It's only interested in things before those things are cool.
martinaglvabout 13 years ago
Is it safe to assume that this is how Google's search results would look if nobody did SEO?
评论 #3911695 未加载
评论 #3911491 未加载
pjinabout 13 years ago
Funny, on my first query I found an obscure HN scraper:<p><pre><code> http://tazod.com/</code></pre>
评论 #3912002 未加载
vidossabout 13 years ago
Exactly what I was thinking in 2001 =&#62; <a href="http://www.halfbakery.com/idea/The_20Other_20search_20engine" rel="nofollow">http://www.halfbakery.com/idea/The_20Other_20search_20engine</a><p>Glad to see someone did it now...
g_linedabout 13 years ago
This is a great idea and I see myself coming back to this. It's a shame that a little blog on tumblr or blogspot gets taken out because it's under a big name domain - but this has spam related benefits too.<p>Great work!
评论 #3910785 未加载
jtchangabout 13 years ago
Wow some of the content there is great. Forget about searching the deep web. For me deep is the real gems buried under the first 100 or so results where stuff actually gets interesting!
chrislomaxabout 13 years ago
Only a little thing, could do with maintaining query strings between pages. It lost my query string and returned no results when I changed the drop down without me noticing.
评论 #3910707 未加载
Sunchoabout 13 years ago
I have been wanting something like this for a while. It's even on my todo list. Thanks for saving me the work. I'll be using it all the time!
felixchanabout 13 years ago
How did you build this? Are you indexing the entire web yourself? Or are you using Google's index/removing the top 1 million based on domain?
评论 #3910733 未加载
mswenabout 13 years ago
I just tried it with a search for some competitive intelligence. I used the 100K removal option. I found a competitor in another country that had not made the top 2 pages on Google. It confirms that others are launching something similar to what I am building... but also the fact that it doesn't bubble to the top on Google means that the market space is not dominated yet.
garraethabout 13 years ago
I love this! And am totally going to use it. Removing the top "thousand sites" removes pretty much all the sites I WISH I could have filtered from my Google results anyhow (ehow, w3schools, etc).<p>One request: please keep the search text in the form field after clicking "search". Just so users can search the same thing multiple times with different values in the "Remove the top" drop-down.
cnbeuiwxabout 13 years ago
Thank you - very refreshing! DuckDuckGo should implement something like this just for the spirit of it.<p>The web just got more interesting. :)
fibberyabout 13 years ago
Doesn't make a dent in the travel site spam, unfortunately. though I might use this just to permanently remove About.com...
评论 #3910989 未加载
83457about 13 years ago
Just re-found a site I was looking for but couldn't find with google the other day. This could definitely be helpful.
selectoutabout 13 years ago
Not sure if I found an anomaly or what, but a simple search of "Privacy" returns results from thesaurus.com, merriam-webster.com, truste.com, kelloggcompany.com, and many more that are all in the top few thousand according to QuantCast and Compete.<p>Great idea though, will definitely try this out some more.
评论 #3910759 未加载
dclowd9901about 13 years ago
Is it just me, or are the results fairly congruent with standard results from a search engine?
评论 #3911188 未加载
duckabout 13 years ago
Great idea, although I think if you could explain it a bit better you could avoid the confusion like several of these comments are showing. I like how my Hacker Newsletter project shows up #2 when searching for Hacker News. :)
评论 #3910678 未加载
tocommentabout 13 years ago
I think remove results with my search term in the domain name and this would be perfect!<p>For example I searched for how to start a garden and I can guarantee that startagarden.com is junk. But indie see some useful advice from small blogs etc
评论 #3910625 未加载
doktrinabout 13 years ago
I'm getting odd results with the following query :<p>Search String : Ruby<p>Remove From Top : 1000 &#38; 10000<p>In both instances, the top hit is <a href="http://www.ruby-lang.org" rel="nofollow">http://www.ruby-lang.org</a>, which is also the top hit from both Google and DDG.<p>Am I missing something?<p>edit: formatting
评论 #3912423 未加载
hybrid11about 13 years ago
This is a cool search engine for discovery, but it defeats the purpose when you are looking for a location. Do a search for "facebook", you will not get any result that links you to facebook.com .
评论 #3911851 未加载
brudgersabout 13 years ago
I would like to see the search engine adhere more strictly to quoted search terms. It seems that they are partially ignored, which gives it some of the same problems that the major engines have.
radleyabout 13 years ago
Please add a favicon so I can see in my (icon only) Bookmarks Toolbar.
评论 #3911390 未加载
ChristianMarksabout 13 years ago
Good idea. It's about time that search engines route around the power-law distributions of popular sites, popular bloggers and personalities to find the gems otherwise buried in the noise.
zechoabout 13 years ago
&#62; Imagine a search engine that simply removed the top 1 million most popular web sites from its index. What would you discover?<p>A lot of my competitors who are still on the first page of Google results.
mserdarsanliabout 13 years ago
Wow that is pretty awesome. I reached some results I want that I could not find via popular search engines with hours of searching. Believe it or not, this engine is changing my life.
评论 #3910860 未加载
thorin_2about 13 years ago
Very interesting. I was pleased with the results and have already added this site to my Chrome bookmark bar, right between my Google search and Hacker News icons.
Tossrockabout 13 years ago
Well, in a rather meta turn of events, searching for my username on this returned a link to hackerbra.in, which appears to be some kind of HN mirror.
carlosaguayoabout 13 years ago
If you search for "google" and remove the top million results, you still get google main page (in this case, the one for australia and india...)
grampajoeabout 13 years ago
The site's way too wide on my netbook, 1024x600. Also, the list of domains removed from the results covers up part of the results themselves.
dude123122about 13 years ago
Another Cool feature could be to exclude sites that use Adwords or Paid Search from the list too. Then it would really just be legit sites.
rogerbinnsabout 13 years ago
It doesn't appear to work. I did a search for aspirin and the top match returned by this is #5 doing the same search with Google.
评论 #3910553 未加载
评论 #3910576 未加载
matheticabout 13 years ago
If this becomes popular, at some point results would disappear since unpopular sites will be pushed into the first million.
评论 #3910936 未加载
评论 #3911814 未加载
taxonomymanabout 13 years ago
8 hours later, we just launched our first re-design. Thanks for all the great feedback and support. More to come.
itabout 13 years ago
I really like this. Right away I found some new sites about that I hadn't seen before with interesting content.
hsparikhabout 13 years ago
As someone learning web development, I'd love to get some insights into how one could build this.
评论 #3915455 未加载
pnathanabout 13 years ago
Thanks.<p>Google has really killed the discoverability of the internet for me. I will be experimenting with this.<p>Best of luck.
wyckabout 13 years ago
This is an incredible breath of fresh air, what an odd thing to say.
pazimzadehabout 13 years ago
Removing Wikipedia might be a mistake. Otherwise, it's great.
评论 #3911277 未加载
评论 #3911258 未加载
Ben_Burkeabout 13 years ago
i searched for my site and in teh goog i get first page....here i found nothing....so for me this = no good. I understand the base but i dont understand the result
tomeldersabout 13 years ago
Should I be depressed that I'm the top hit for my own name?
stcredzeroabout 13 years ago
I wonder if there's an analogous hack for social news?
qwertyzabout 13 years ago
Now if I could add it to firefox's search bar...
评论 #3912769 未加载
cleverjakeabout 13 years ago
very interesting hack. thanks for doing it
thar2012about 13 years ago
non-popular websites will start seeing some good traffic suddenly. It would be confusing for them :)
tinyjoeabout 13 years ago
my website ranked 1st? guess i need to work harder T_T
tsunamifuryabout 13 years ago
Searched my name... Got my website<p>:(
taskstrikeabout 13 years ago
you should some how incorporate hipster into the search site's name.
digitallimitabout 13 years ago
I searched for "Hero Academy" and the first result was Google's 5th result, a site called Hero Academy with the url "hero-academy.com". That's not very "million short", IMHO.
评论 #3910895 未加载