Ask HN: Is there a search engine which excludes the world's biggest websites?

578 pointsby cJ0thabout 5 years ago

Discovering unknown paths of the web seems almost impossible with google et al..Are there any earch engines which exclude or at least penalize results from, say, top 500 websites?

79 comments

noadabout 5 years ago

This is a great question, I also want a way to search the internet but exclude all major media domains as well as any company over a certain size. So I just want to search through old blogs, SO, non-corporate social media, weird forums, etc.There are so many cool things I remember reading on the web like 10-20 years ago that still exist that are so buried now on Google they might as well not exist. Nowadays searching any topic seems to always lead you to CNN and Microsoft and Facebook and other huge corporations. Search results are just becoming more sanitized and beige and meaningless every day.

评论 #23205341 未加载

评论 #23203657 未加载

评论 #23204461 未加载

评论 #23203284 未加载

评论 #23203346 未加载

评论 #23212486 未加载

评论 #23210047 未加载

sanquiabout 5 years ago

There is a search engine with this exact goal: <a href="https://millionshort.com/" rel="nofollow">https://millionshort.com/</a>.I haven't had that great results with it myself though.

评论 #23203225 未加载

评论 #23205391 未加载

评论 #23232164 未加载

erikbyeabout 5 years ago

For google you can use this <a href="https://addons.mozilla.org/en-US/firefox/addon/g-search-filter/" rel="nofollow">https://addons.mozilla.org/en-US/firefox/addon/g-search-filt...</a>, just drop in your list of those 500 URLs, once you've decided on what the top 500 is.For other engines you can use <a href="https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/" rel="nofollow">https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/</a> with this script <a href="https://greasyfork.org/en/scripts/1682-google-hit-hider-by-domain-search-filter-block-sites" rel="nofollow">https://greasyfork.org/en/scripts/1682-google-hit-hider-by-d...</a>

评论 #23208359 未加载

评论 #23208900 未加载

thekyleabout 5 years ago

There is Million Short which allows you to search without the top 100, 1k, 10k, 100k, or 1m sites. Personally what I'd like to see is a search engine that only indexes webpages without ads since that should eliminate lots of the SEOd garbage. It would also be nice to use the text to code ratio to derank JS heavy sites.<a href="https://millionshort.com/" rel="nofollow">https://millionshort.com/</a>

评论 #23208526 未加载

评论 #23209048 未加载

tlarkworthyabout 5 years ago

I made a script on ObservableHQ to surf YouTube psuedo-randomly <a href="https://observablehq.com/@tomlarkworthy/random-place-on-youtube" rel="nofollow">https://observablehq.com/@tomlarkworthy/random-place-on-yout...</a>I do a random city + documentary as the search term, it's taken me all over the world and seen some very strange things.One of my favourites was Aarhus, which had a Danish language rapper proclaiming he was putting Aarhus on the global map (I have never heard of the city of Aarhus). <a href="https://youtu.be/WSZxuzgImLo" rel="nofollow">https://youtu.be/WSZxuzgImLo</a> They dis Copenhagen a lot too, lol. You get a more intimate YouTube experience with the low view videosBut I also seen amazing religious rituals. An excellent documentary on Karachi.Because it's observable hq you can fork it and figure out your own algorithm for biasing the random.

totemandtokenabout 5 years ago

Reminds me of this classic pg essay: <a href="http://www.paulgraham.com/ambitious.html" rel="nofollow">http://www.paulgraham.com/ambitious.html</a>Specifically this quote: "The way to win here is to build the search engine all the hackers use. A search engine whose users consisted of the top 10,000 hackers and no one else would be in a very powerful position despite its small size, just as Google was when it was that search engine."There has been a lot of grumblings about the state of search these days. Maybe the time is nigh for a new search engine?

评论 #23207909 未加载

评论 #23208744 未加载

igammaraysabout 5 years ago

DEVONagent is a highly configurable search utility which can be used to combine and de-duplicate results from multiple search engines at once, exclude sites or keywords from a blacklist, follow deep links within search pages, and perform some filtering logic on the text of results.Before I knew about DEVONagent I would often just search multiple engines and sources trying to find something particular (e.g. a particular PDF) or unique results.<a href="https://www.devontechnologies.com/apps/devonagent" rel="nofollow">https://www.devontechnologies.com/apps/devonagent</a>

评论 #23203706 未加载

评论 #23204082 未加载

评论 #23204023 未加载

pavelmarkabout 5 years ago

Simply removing Pinterest would be a huge step in the right direction.

评论 #23207632 未加载

评论 #23205022 未加载

评论 #23207659 未加载

chaos_aabout 5 years ago

<a href="https://wiby.me/" rel="nofollow">https://wiby.me/</a> exists to solve this exact problem. I've found some pretty neat/odd websites on it in the past.

评论 #23203254 未加载

评论 #23203388 未加载

评论 #23203921 未加载

评论 #23211484 未加载

评论 #23204420 未加载

评论 #23208365 未加载

mikekcharabout 5 years ago

Here is an idea that I've always wanted to do, but will never have time for: A curated search engine.Basically the idea is to have people band together and "recommend" links. You then do your normal spidering of the websites to create a search engine (or even just call through to a number of existing search engines). However, the ranking of the results is based on the weighting of the recommendations.It's essentially a white list based on your own personal bubble. Of course this won't work in general because you will always get SEO creeps spamming recommendations. However, it gives you tools for working around those creeps. The average person probably won't be able to manage it, but power users probably will.By not trying to solve the problem for everybody, it makes it easier to solve to problem for some people. Or at least that's my thesis :-) I might be wrong.

netsectodayabout 5 years ago

You can boot up your own custom search engine in a few minutes with YaCy (Ya See!) an open-source, P2P, Dockerized crawler and search engine built on top of Solr.<a href="https://yacy.net/" rel="nofollow">https://yacy.net/</a>If you're generous; you can make your index available to other P2P instances.I wanted to run an API search the other week and was blown away with how quickly I could prop-up my own custom search portal (I didn't want to pay for API access to other search engines, and YaCy comes with a JSON and Solr endpoints).I ran it locally to test my crawl filters, then pushed a private instance out to Digital Ocean to turn up the heat with the crawling. The only issue I had was the crawler would hit the max memory threshold on long crawls and the container would restart, but that was fixed by scaling up the box.

评论 #23225568 未加载

crawlcrawlerabout 5 years ago

I built a search engine for this and other, similar purposes. With Crawl Crawler you start out by searching the meta data of a Common Crawl ("CC") crawl. Then you define a sub section of that data collection by designing a query which search result includes your favorite sites. Then you enrich that sub section by linking those meta data documents (that come from CC's WAT repo) to full text extracts or HTML from CC's WET repo or the WWW. Then you set it to recurringly refresh that section. Voila! You have created a search index that includes your preferred sites. <a href="https://crawlcrawler.com" rel="nofollow">https://crawlcrawler.com</a>

评论 #23204227 未加载

allwynpfrabout 5 years ago

You should try million short. As the name suggests, it takes our the first 100 / 1k / or a million results so you're left with those that aren't all that popular. That seems to be what you're looking for. <a href="https://millionshort.com/" rel="nofollow">https://millionshort.com/</a>

nic-wallerabout 5 years ago

My hobby project is <a href="https://random.surf" rel="nofollow">https://random.surf</a> (works better on desktop than mobile).I share that same desire to visit the web less travelled. I want to discover interesting sites that deserve to be bookmarked because they will never show up in a search engine.

评论 #23209119 未加载

评论 #23232214 未加载

评论 #23212503 未加载

评论 #23207924 未加载

dangoljamesabout 5 years ago

There used to be java applet embedded in altavista.com's website that could be run against search results. It would do semantic processing on the results and present a list of generated terms, each with a checkbox. Checking a box would pull any returns which contained the topic from the remaining search results.This was fire. If a topic were being discussed on the web, you could find it with this tool. Unfortunately, it did not fit the vision of the parasitic overlords who bred us to produce and consume for their benefit.

评论 #23204967 未加载

dennisyabout 5 years ago

I think you could get good results if you just penalise sites for the number of third party JS. Which shows by proxy a more established site/corp.You could add a bunch of heuristics such as size, number of links etc.Maybe even train a classifier to select the “smaller” part of the web.

inopinatusabout 5 years ago

I would pay real subscription money for a search engine that focused on knowledge-oriented results rather than retail and commercial results.When I type “shoes”, it would give me: links for the functional and creative history of footwear, the taxonomy of shoes, methods of construction, current and historical footwear industry data, synonyms and antonyms, related terms and professions, the dictionary definition, and similar links related to secondary meanings (such as any protective covering at the base of an object, horseshoes etc). I’d also hope for a comedy link to a biography of Cordwainer Smith.What I actually get, which I don’t want at all: pages and pages of shoe shopping.The various means to exclude “top X sites” are the roughest possible heuristic in that direction, and throw out the baby with the bathwater (for example, a long-established manufacturer may well have an informational online exhibit)Google has essentially failed me in its primary mission. Bing at least has the grace to admit they are here to “connect you to brands”. And sadly, right now, every other option is an also-ran.In practice I use DDG, directed by !bangs towards known encyclopaedic or domain-specific sources. I am certain that I’m missing out.

评论 #23210363 未加载

评论 #23208140 未加载

text_exchabout 5 years ago

I've long wanted to build a search engine of only personal blogs. I am less familiar with the field of information retrieval so I haven't gotten started yet, but it's always been a dream of mine and if anyone is interested please contact me at threemillionthflower [at] the world's largest email provider.Discovering unknown parts and blogs on the internet is one of the enduring goals of a newsletter that I run [1], which provides a single link to an interesting article every day, usually by lesser-known authors and blogs across the internet.[1] www.thinking-about-things.com

评论 #23232593 未加载

011-videoabout 5 years ago

You are your best search engine !On a daily basis your brain use shortcut to get to the point. Open Firefox (of course) ALT+B. Then add a new bookmark for instance :Name : Stack OverflowLocation : <a href="https://stackoverflow.com/search?q=%s" rel="nofollow">https://stackoverflow.com/search?q=%s</a>Tags :Keyword : stNow if you want to search "javascript timer", just type : st javascript timerAdd "%s" to all your favorites website search url.Example : <a href="https://en.wikipedia.org/wiki/%s" rel="nofollow">https://en.wikipedia.org/wiki/%s</a>To discover some new website content, apply the same trick to Hacker news, Reddit or any RSS River.Voila, bye bye GG.

NateEagabout 5 years ago

For Google, you can ignore specific sites by adding "-example.com".See this example of filtering Stack Overflow out of search results:<a href="https://www.google.com/search?q=loop+over+array+items+in+javascript+-stackoverflow.com" rel="nofollow">https://www.google.com/search?q=loop+over+array+items+in+jav...</a>

评论 #23206148 未加载

brentisabout 5 years ago

Imagine if sort results had table filters and sort.Popularity, Relevance, Age, Type, etc. type could be blog, forum, site, or video. Or like it used to be.

评论 #23208754 未加载

sneeuwpopsneeuwabout 5 years ago

I personally use Google Chrome with the duckduckgo search engine. Duckduckgo is not perfect, very in depth searches (such as gameboy advance memory layout only return junk, while google knows you are searching for a nich) but on your average search it is as good as google, somethimes better because it is more factual and will promote less webstores. When it does not give me what im looking for I can add !g anywhere in the question and the same search is done using google.Then I use Violentmonkey an open source js/css injector to inject this user script: <a href="https://greasyfork.org/nl/scripts/1682-google-hit-hider-by-domain-search-filter-block-sites" rel="nofollow">https://greasyfork.org/nl/scripts/1682-google-hit-hider-by-d...</a> This will block specific domains for you in google, yahoo, duckduckgo etc. I use this to block domains like Quora, sourceforge, cnet and softonic.The nice thing about this script is that you can permaban domain you know are junk and they will completely be removed or you can ban a domain like commercial websites. When you ban something it is not removed from google or duckduckgo but it only shows the title in light gray, Im currently experimenting with this on some mayor webstores so I can not really say if this may help you but It can be a good start.(edit) I saw some people say why this was not possible before. Google allowed you to block domains and website a few years ago, but they removed this feature. Duckduckgo never allowed you to do that because that would mean that you will have a cookie that remembers your preferences and that is against there principles.

评论 #23206140 未加载

评论 #23204667 未加载

greglindahlabout 5 years ago

If the question is "Is there a commercially-viable search engine that supports this feature", then the answer is "probably not".Implementing this properly involves having your own search index. And that's pretty expensive.

bamboozledabout 5 years ago

I think on DDG you can do !mil which excludes the first million top ranking sites.Edit: Maybe it’s the first million results? I use it to find obscure things sometimes.

mmsimangaabout 5 years ago

When researching a topic I have had great success searching HN and reading through the comments. If I want to find alternative software tools for a tool I am using the comments on HN are best. Searching through subreddits also yields better results than Google.

DavidPiperabout 5 years ago

It feels like there could be a (partial) meta solution here:A search engine that returns results whose pages weigh in under a certain size.From the comments it seems most of the "cruft" filling up Google results are newer web apps, generally JS-heavy and advertising-heavy, etc.If you had a filter for pages with (e.g.) < ABC kb of JS, < XYZ external links (excluding img tags), I feel like there'd be a good chance that the "old" web and the "unknown" web would bubble to the top.There are plenty of false positives (particularly for "small" forums build with modern JS apps, etc), but it could be one of many filtering tools to achieve better search results.

评论 #23208346 未加载

turniplaabout 5 years ago

Google used to let you blacklist websites many moons ago, that would go a long way already.Now there are a few extensions that do that, but obviously they only hide the results from each page, so sometimes you will see pages with 2 results, if any at all.

评论 #23203293 未加载

methouabout 5 years ago

I used a Google Custom Search Engine (CSE) to remove results from Softonic and alikes, it works well, but still very Google.

评论 #23204105 未加载

dexenabout 5 years ago

There is a similar problem where Youtube's recommendations and auto-play are mostly big name brands, to the exclusion of individual reporters, commentators, and other content producers. Since recently, a "De-Mainstream Youtube" plugin[1] is available for Firefox and Chrome, fixing that to some extent.--[1] <a href="https://demainstream.com/" rel="nofollow">https://demainstream.com/</a>

bmd3991about 5 years ago

What I’d like to see is a search that excludes any page with ads, and any page with affiliate links. That alone would get rid of 90% of the garbage

评论 #23212811 未加载

评论 #23207673 未加载

peel40about 5 years ago

I think there's a simple google way. Just add `-bigwebsite.com` to your query.[search term] -google -youtube -facebook ... -top100website and it should work.I found a list of the top 1m alexa websites here:<a href="http://s3.amazonaws.com/alexa-static/top-1m.csv.zip" rel="nofollow">http://s3.amazonaws.com/alexa-static/top-1m.csv.zip</a>An add-on with that list should do the work.

评论 #23207171 未加载

评论 #23208226 未加载

bhartzerabout 5 years ago

There is a custom search engine called Newgle.xyz that only shows results from the 1000 or so new gTLDs (new top level domains).It’s custom google search results, but since it’s excluding .com, .net, .org etc then you probably won’t see any of the large sites there.It’s also interesting to see which sites have been built in the last few years, as the new gTLDS haven’t been around that long.

rkagererabout 5 years ago

I would like one that punishes sites with too much ad to content ratio.

loosetypesabout 5 years ago

What are folks’ non-commoditized heuristics for finding new things online?I was intrigued by how dorkweed’s approach has changed over time, as described in a reply to a sibling comment.As general search results get watered down and rotten tomato inflation maybe trends towards reflecting company interests rather than my interest-level, maybe it’s worth re-evaluating the vetting avenues we take as users.Here’s mine: for games and shows I’ve recently found myself using quantity of fan-videos on YouTube as a proxy for quality. So far it’s been a decent means to find cult followings for something I otherwise wouldn’t necessarily hear about.Obviously this approach has its flaws - and is subject to financial perversions to an extent - but I figure if enough people genuinely want to pay tribute to a work, it might be worth checking out.

评论 #23206853 未加载

ChrisMarshallNYabout 5 years ago

Remember sites like stumbleupon?I find that the YouTube sidebar is useful for me to find interesting music. I have eclectic tastes, and Google seems to have figured that out. I don't mind.I suspect that it would be possible to create a custom API query to Google that would have a "blacklist."

smsm42about 5 years ago

There's Million Short: <a href="https://millionshort.com/" rel="nofollow">https://millionshort.com/</a>I think they try to do exactly what you ask, but I haven't used them extensively so don't know how good are they.

abarrettwilsdonabout 5 years ago

For more queries, you can add modifiers to a Google Search to get the results you wantSeeing folks mention the NOT operator (-). It's quite powerful! For example, you can do:intext:"Powered by intercom" -site:intercom.com will find all the sites that use the Intercom widgetor ~blog bread baking -inurl:checkout -intext:checkout will find bread blogs (or similar) without commercial intentI put together a list of the two dozen or so most useful templates of this, for folks who are interested: <a href="https://www.alec.fyi/dorking-how-to-find-anything-on-the-internet.html" rel="nofollow">https://www.alec.fyi/dorking-how-to-find-anything-on-the-int...</a>

dhbradshawabout 5 years ago

I've wondered too about something similar to that. Basically, I'd like sessions for searching.Each session would have an updatable list of sites that are favored, whitelisted or blacklisted for a particular class of search.

maayankabout 5 years ago

I'm intrigued by actual use-cases for it except exploring, i.e. where it would give better result for a query than the common search engines.Anyone reading this, please post if you find any

评论 #23205174 未加载

chasd00about 5 years ago

in the same vein, it would be awesome to search for a product to buy with the results being ecomm websites owned by people in my area. A way to "shop local" online.

评论 #23203260 未加载

评论 #23217230 未加载

评论 #23203424 未加载

ameliusabout 5 years ago

If only Google allowed us to omit websites from search results.Google says they need our information to "improve our experience", but we can't tell them what to omit ...

评论 #23203713 未加载

fededeabout 5 years ago

Hey! I actually liked this idea and I'm considering starting a learning project on it. I've seen a lot of interest and ideas in the comments, and decided to create a very short Google form to start gathering all the interested people so we can organise something interesing. Is anybody in? :)<a href="https://forms.gle/5KuTYVdYaMzRD2n78" rel="nofollow">https://forms.gle/5KuTYVdYaMzRD2n78</a>

jsgoabout 5 years ago

I don't know that I'd want a search engine to specifically exclude or limit the results of specific sites of their choosing (even if top 500 as the example is fairly unbiased), but I think I'd really like the ability to say "move these specific domains a few pages back. Don't eliminate them outright, but I have felt dumber having read them previously."

pengstromabout 5 years ago

What I want is to filter out commercial results. When I'm searching for a product I don't want shills, I want real opinions.

评论 #23204941 未加载

21xhipsterabout 5 years ago

<a href="https://cyber.page" rel="nofollow">https://cyber.page</a>Its kinda new so it excludes kinda everything :-) But you can make it work better :-)<a href="https://ipfs.io/ipfs/QmQ1Vong13MDNxixDyUdjniqqEj8sjuNEBYMyhQU4gQgq3" rel="nofollow">https://ipfs.io/ipfs/QmQ1Vong13MDNxixDyUdjniqqEj8sjuNEBYMyhQ...</a>

评论 #23204250 未加载

freefriedriceabout 5 years ago

Why exclude the biggest websites?The problem I see on DDG & Google is having to scroll 5-10 pages of utter SEO nonsense."Do you have a question about ____? Many ask about _______. ____ is a common question, here the are we some answer. [sic]".Just utter garbage pages.It used to be just with recipes or medical questions, but now it feels like most everything that is a general query.

评论 #23206133 未加载

piuspabout 5 years ago

I have used copernik in the past this was a collection of search engines, listing more than 140 search engines. It combined the search results and sorted them by the key word % matched. It also had a lot of tools inbuilt for validating he links, coping the selection/ sorting and sharing the results. Simply amazing results.

wyckabout 5 years ago

Google search is so sad these days, all results are media conglomerates, it's completely counter to the core reason why the internet existed. I really hope by catering to these mega corps that they are completely undermining their brand and someone else comes along and pulls the rug out from underneath them.If anyone noticed during the first couple days of covid, google search was free from large media results, the algorithm reverted back to how it was years ago and it was such a breath of fresh air. Of course they fixed the algo immediately, it went back to only showing curated media results..there was an anon google employee who posted why this occurred.

评论 #23205821 未加载

评论 #23205910 未加载

评论 #23205839 未加载

评论 #23205979 未加载

评论 #23205899 未加载

pkambabout 5 years ago

I'd love a search engine that mainly searched Stack Exchange, (old.)Reddit, and some subset of blogs or single-author websites.Especially removing Quora, Pinterest, and aggregation/reposting/SEO/affiliate blogs.And all "product" images with a white background. Only show real photographs.

评论 #23206114 未加载

Cyclone_about 5 years ago

Seems like a browser plugin might be a quick and dirty way of just filtering results to achieve the goal.

social_quotientabout 5 years ago

A mainstream search engines kinda like a big marker equity ETF or index? There are a ton of benefits but as a negative they make price discovery difficult and give monetary allocation to companies that probably shouldn’t have it.Just a thought experiment, curious what others think.

wmnwmnabout 5 years ago

Maybe what we need is a return to the very beginning, namely human curated web catalogs, aka Yahoo

dluanabout 5 years ago

I mainly use google as a reddit search engine these days. "tiki cocktail pineapple juice reddit" gives me way more than google algorithm, and plus it's kind of like human powered SEO where genuinely useful links will likely have some discussion.

rdtwoabout 5 years ago

So I figured I’d try a few of these with “Seattle vegetable garden blog” as the keyword. Either there aren’t a lot of blogs on the topic or most search engines miss them because results are sparse and they really shouldn’t be.

ErikAugustabout 5 years ago

A curated, searchable web directory might be a concept that could come to be these days. It would share some of its DNA with the old school web directory but also share some with a search engine.

tokyokawasemiabout 5 years ago

I sometimes use "inurl:wordpress" when searching for travel info. This ensures more first-person blog accounts, rather than all the tripadvisor junk that's at the top.

knownabout 5 years ago

<a href="https://twitter.com/search?q=twitter&src=typed_query" rel="nofollow">https://twitter.com/search?q=twitter&src=typed_query</a>

moreWeedabout 5 years ago

Man you read my mind, just starting thinking about this. From a search censorship perspective, the BBS's we were building in 93 would be better than what we have now.

Nevada-Smithabout 5 years ago

Depending on what you're looking for, try Google Scholar [1][1] <a href="https://scholar.google.com/" rel="nofollow">https://scholar.google.com/</a>

blondinabout 5 years ago

omg yes please.can google allow us to exclude certain sites? i was surprised to see w3school showing up above official documentations for pandas and numpy. this is simply ridiculous!!

评论 #23204441 未加载

badrabbitabout 5 years ago

It wouldn't be hard to remove such results using a browser extension,but you will be scrolling a lot. Maybe duckduckgo should support it,feature request?

saadalemabout 5 years ago

Ok here is an additional idea for fun :A search engine that shows only urls that are not indexed b google / another one that gives you the websites with lower pagerank

评论 #23205989 未加载

评论 #23232793 未加载

jungletimeabout 5 years ago

Is there an option to filter out news articles?"If you don’t read the newspaper you are uninformed; if you do read the newspaper you are misinformed." Mark Twain

评论 #23204401 未加载

dangoljamesabout 5 years ago

Is there a search engine that actually combines selective search logic with reductive logic, so that can be used to actually search topically?

corndogeabout 5 years ago

Similarly, I have always wanted a YouTube search filter for "least views", since that content is invariably way higher quality

thoughtstheseusabout 5 years ago

I think one of the underlying problems in search is that most search engines are more like recommendation systems.

runawaybottleabout 5 years ago

Filter google against Alexa rankings?

egberts1about 5 years ago

What we all really need is a long-tail Bloom filter search engine on the search engines themselves.

coronadisasterabout 5 years ago

if google would switch to the google's engine code that was used right before they modified the "+" operator for google-plus, it would be a lot better.... ie: bring back the + operator please (the quotes dont work the same way)

citizenpaulabout 5 years ago

RSS used to be this. Google has done its best to kill it.

aiisjustanifabout 5 years ago

RIP StumbleUpon. The randomized search engine I want back.

starfallgabout 5 years ago

The elephant in the room is Baidu.

评论 #23203468 未加载

wojtczykabout 5 years ago

That’s what hacker news is for ;)

Upvoter33about 5 years ago

google: search terms -site:cnn.com -site:wikipedia.org -site:...

aiisjustanifabout 5 years ago

RIP StumbleUpon.

martin-adamsabout 5 years ago

I like this question. I’ve often wanted a search engine which gives you the choice to find sites that don’t contain a paywall, tracking or advertising.

graycatabout 5 years ago

For> Ask HN: Is there a search engine which excludes the world's biggest websites?> Discovering unknown paths of the web seems almost impossible with google et al..> Are there any earch engines which exclude or at least penalize results from, say, top 500 websites?Let's back up a little and then try for an answer:Some points:(1) For some qualitative exclamation, there is a LOT of content on the Internet.(2) There are in principle and no doubt so far significantly in practice a LOT of searches people want to do. The search in the OP is an example.(3) Much like in an old library card catalog subject index, the most popular search engines are based heavily on key words and then whatever else, e.g., page rank, date, etc.So: (1) -- (3) represent some challenges so far not very well met: In particular, we can't expect that the key words, etc. of (3) will do very well on all or nearly all the searches in (2) for much of the content in (1).And the search in the OP is an example of a challenge so far not well met.Moreover, the search in the OP is no doubt just one of many searches with challenges so far not well met.Long ago, Dad had a friend who worked at Battelle, and IIRC they did a review of information retrieval that concluded that keyword search covers only a fraction, maybe ballpark only 1/3rd, of the need for effective searching. And the search in the OP is an example of what is not covered because the library card catalog did not index size of the book or Web site! :-)!Seeing this situation, my rough, ballpark estimate has been that the currently popular Internet search engines do well on only about 1/3rd of the content on the Internet, searches people want to do, and results they want to find.So, I decided to see what could be done for the other 2/3rds.I started with some not very well known or appreciated advanced pure math; it looks like useless, generalized abstract nonsense, but if calm down, stare at it, think about it, ..., can see a path for a solution. Although I never thought about the search in the OP until now, in principle the solution should work also for that search. Or, the math is a bit abstract and general which can translate in practice to doing well on something as varied as the 2/3rds.Then for the computing, I did some original applied math research.Using TeX, I wrote it all up with theorems and proofs.So, the project is to be a Web site. While in my career I've been programming for decades, this was my first Web site. I selected Windows and .NET, and typed in 100,000 lines of text with 24,000 statements in Visual Basic .NET (apparently equivalent in semantics to C# but with syntactic sugar I prefer).The software appears to run as intended and well enough for significant production.I was slowed down by one interruption after another, none related to the work.But, roughly, ballpark, the Web site should be good, or by a lot the best so far, for the 2/3rds and in particular for the search in the OP.So, for> Ask HN: Is there a search engine which excludes the world's biggest websites?there's one coded and running and on the way to going live!I intend to announce an alpha test here at HN.

评论 #23209332 未加载

评论 #23208770 未加载

notaphilosopherabout 5 years ago

I'd like:- health search that excludes sellers, wellness and snake-oil websites- news search that excludes conspiracy theories, magical thinking, political operatives, and paid bloggers- image search by similarity, similarity to an uploaded picture/s, words, or description- media and warez search engine that excludes link-spam and malware sites- complex queries search because none of them do it well- anonymity- shopping search that kicks out disreputable sellers and phony store-fronts- mapping like OSM but fast, practical with an app, and detail-accessible- monetize using affiliate links that don't affect ranking- semi-curated results (domain reputation-ranked voting)- related pages- inbound/outbound links search- archive.org integration &| history page caching- documented query syntax- query within results- quick query history results navigation- keyword alerts- keyboard shortcuts that always work

burmerabout 5 years ago

DuckDuckGo /s

79 comments

noadabout 5 years ago

评论 #23205341 未加载

评论 #23203657 未加载

评论 #23204461 未加载

评论 #23203284 未加载

评论 #23203346 未加载

评论 #23212486 未加载

评论 #23210047 未加载

sanquiabout 5 years ago

There is a search engine with this exact goal: <a href="https://millionshort.com/" rel="nofollow">https://millionshort.com/</a>.I haven't had that great results with it myself though.

评论 #23203225 未加载

评论 #23205391 未加载

评论 #23232164 未加载

erikbyeabout 5 years ago

评论 #23208359 未加载

评论 #23208900 未加载

thekyleabout 5 years ago

评论 #23208526 未加载

评论 #23209048 未加载

tlarkworthyabout 5 years ago

totemandtokenabout 5 years ago

评论 #23207909 未加载

评论 #23208744 未加载

igammaraysabout 5 years ago

评论 #23203706 未加载

评论 #23204082 未加载

评论 #23204023 未加载

pavelmarkabout 5 years ago

Simply removing Pinterest would be a huge step in the right direction.

评论 #23207632 未加载

评论 #23205022 未加载

评论 #23207659 未加载

chaos_aabout 5 years ago

<a href="https://wiby.me/" rel="nofollow">https://wiby.me/</a> exists to solve this exact problem. I've found some pretty neat/odd websites on it in the past.

评论 #23203254 未加载

评论 #23203388 未加载

评论 #23203921 未加载

评论 #23211484 未加载

评论 #23204420 未加载

评论 #23208365 未加载

mikekcharabout 5 years ago

netsectodayabout 5 years ago

评论 #23225568 未加载

crawlcrawlerabout 5 years ago

评论 #23204227 未加载

allwynpfrabout 5 years ago

nic-wallerabout 5 years ago

评论 #23209119 未加载

评论 #23232214 未加载

评论 #23212503 未加载

评论 #23207924 未加载

dangoljamesabout 5 years ago

评论 #23204967 未加载

dennisyabout 5 years ago

inopinatusabout 5 years ago

评论 #23210363 未加载

评论 #23208140 未加载

text_exchabout 5 years ago

评论 #23232593 未加载

011-videoabout 5 years ago

NateEagabout 5 years ago

评论 #23206148 未加载

brentisabout 5 years ago

Imagine if sort results had table filters and sort.Popularity, Relevance, Age, Type, etc. type could be blog, forum, site, or video. Or like it used to be.

评论 #23208754 未加载

sneeuwpopsneeuwabout 5 years ago

评论 #23206140 未加载

评论 #23204667 未加载

greglindahlabout 5 years ago

bamboozledabout 5 years ago

I think on DDG you can do !mil which excludes the first million top ranking sites.Edit: Maybe it’s the first million results? I use it to find obscure things sometimes.

mmsimangaabout 5 years ago

DavidPiperabout 5 years ago

评论 #23208346 未加载

turniplaabout 5 years ago

评论 #23203293 未加载

methouabout 5 years ago

I used a Google Custom Search Engine (CSE) to remove results from Softonic and alikes, it works well, but still very Google.

评论 #23204105 未加载

dexenabout 5 years ago

bmd3991about 5 years ago

What I’d like to see is a search that excludes any page with ads, and any page with affiliate links. That alone would get rid of 90% of the garbage

评论 #23212811 未加载

评论 #23207673 未加载

peel40about 5 years ago

评论 #23207171 未加载

评论 #23208226 未加载

bhartzerabout 5 years ago

rkagererabout 5 years ago

I would like one that punishes sites with too much ad to content ratio.

loosetypesabout 5 years ago

评论 #23206853 未加载

ChrisMarshallNYabout 5 years ago

smsm42about 5 years ago

abarrettwilsdonabout 5 years ago

dhbradshawabout 5 years ago

maayankabout 5 years ago

I'm intrigued by actual use-cases for it except exploring, i.e. where it would give better result for a query than the common search engines.Anyone reading this, please post if you find any

评论 #23205174 未加载

chasd00about 5 years ago

in the same vein, it would be awesome to search for a product to buy with the results being ecomm websites owned by people in my area. A way to "shop local" online.

评论 #23203260 未加载

评论 #23217230 未加载

评论 #23203424 未加载

ameliusabout 5 years ago

If only Google allowed us to omit websites from search results.Google says they need our information to "improve our experience", but we can't tell them what to omit ...

评论 #23203713 未加载

fededeabout 5 years ago

jsgoabout 5 years ago

pengstromabout 5 years ago

What I want is to filter out commercial results. When I'm searching for a product I don't want shills, I want real opinions.

评论 #23204941 未加载

21xhipsterabout 5 years ago

评论 #23204250 未加载

freefriedriceabout 5 years ago

评论 #23206133 未加载

piuspabout 5 years ago

wyckabout 5 years ago

评论 #23205821 未加载

评论 #23205910 未加载

评论 #23205839 未加载

评论 #23205979 未加载

评论 #23205899 未加载

pkambabout 5 years ago

评论 #23206114 未加载

Cyclone_about 5 years ago

Seems like a browser plugin might be a quick and dirty way of just filtering results to achieve the goal.

social_quotientabout 5 years ago

wmnwmnabout 5 years ago

Maybe what we need is a return to the very beginning, namely human curated web catalogs, aka Yahoo

dluanabout 5 years ago

rdtwoabout 5 years ago

ErikAugustabout 5 years ago

A curated, searchable web directory might be a concept that could come to be these days. It would share some of its DNA with the old school web directory but also share some with a search engine.

tokyokawasemiabout 5 years ago

I sometimes use "inurl:wordpress" when searching for travel info. This ensures more first-person blog accounts, rather than all the tripadvisor junk that's at the top.

knownabout 5 years ago

<a href="https://twitter.com/search?q=twitter&src=typed_query" rel="nofollow">https://twitter.com/search?q=twitter&src=typed_query</a>

moreWeedabout 5 years ago

Man you read my mind, just starting thinking about this. From a search censorship perspective, the BBS's we were building in 93 would be better than what we have now.

Nevada-Smithabout 5 years ago

Depending on what you're looking for, try Google Scholar [1][1] <a href="https://scholar.google.com/" rel="nofollow">https://scholar.google.com/</a>

blondinabout 5 years ago

omg yes please.can google allow us to exclude certain sites? i was surprised to see w3school showing up above official documentations for pandas and numpy. this is simply ridiculous!!

评论 #23204441 未加载

badrabbitabout 5 years ago

It wouldn't be hard to remove such results using a browser extension,but you will be scrolling a lot. Maybe duckduckgo should support it,feature request?

saadalemabout 5 years ago

Ok here is an additional idea for fun :A search engine that shows only urls that are not indexed b google / another one that gives you the websites with lower pagerank

评论 #23205989 未加载

评论 #23232793 未加载

jungletimeabout 5 years ago

Is there an option to filter out news articles?"If you don’t read the newspaper you are uninformed; if you do read the newspaper you are misinformed." Mark Twain

评论 #23204401 未加载

dangoljamesabout 5 years ago

Is there a search engine that actually combines selective search logic with reductive logic, so that can be used to actually search topically?

corndogeabout 5 years ago

Similarly, I have always wanted a YouTube search filter for "least views", since that content is invariably way higher quality

thoughtstheseusabout 5 years ago

I think one of the underlying problems in search is that most search engines are more like recommendation systems.

runawaybottleabout 5 years ago

Filter google against Alexa rankings?

egberts1about 5 years ago

What we all really need is a long-tail Bloom filter search engine on the search engines themselves.

coronadisasterabout 5 years ago

citizenpaulabout 5 years ago

RSS used to be this. Google has done its best to kill it.

aiisjustanifabout 5 years ago

RIP StumbleUpon. The randomized search engine I want back.

starfallgabout 5 years ago

The elephant in the room is Baidu.

评论 #23203468 未加载

wojtczykabout 5 years ago

That’s what hacker news is for ;)

Upvoter33about 5 years ago

google: search terms -site:cnn.com -site:wikipedia.org -site:...

aiisjustanifabout 5 years ago

RIP StumbleUpon.

martin-adamsabout 5 years ago

I like this question. I’ve often wanted a search engine which gives you the choice to find sites that don’t contain a paywall, tracking or advertising.

graycatabout 5 years ago

评论 #23209332 未加载

评论 #23208770 未加载

notaphilosopherabout 5 years ago

burmerabout 5 years ago

DuckDuckGo /s