TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: A blocklist to remove spam and bad websites from search results

225 pointsby popcar24 months ago
Hi HN!<p>I&#x27;ve been fed up with search results so much that I decided to make a giant blocklist to remove garbage links by using uBlacklist.<p>I browsed other blocklists and wasn&#x27;t very satisfied from what exists now; the goal of this one is to be super organized and transparent, explaining why each site was blocked via issues. Contributions welcome!<p>Even though around 100 domains are blocked so far, I already noticed a big improvement in casual searches. You&#x27;d be surprised how some AI generated websites can dominate the #1 page on DuckDuckGo.

33 comments

cormorant4 months ago
I&#x27;m fed up too. Spammy, AI-looking sites are showing up more and more. For some reason, many of them use the same Wordpress theme with a light gray table of contents - they look like this: <a href="https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;totally-not-ai-generated-efsumgZ" rel="nofollow">https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;totally-not-ai-generated-efsumgZ</a><p>The problem seems worse on &quot;alternative&quot; search engines, e.g. DuckDuckGo and Kagi, which both use Bing. It&#x27;s been driving me back to Google.<p>A blocklist seems like a losing proposition, unless, like adblock filter lists, it balloons to tens of thousands of entries and gets updated constantly.<p>Unfortunately, this kind of blocklist is highly subjective. This list blocks MSN.com! That&#x27;s hardly what I would have chosen.
评论 #42698286 未加载
评论 #42699974 未加载
评论 #42708022 未加载
评论 #42743367 未加载
评论 #42701060 未加载
Ringz4 months ago
Installed! This should not be a function of the search engine nor a plugin. This should be integrated in the browser.<p>Another great function (not for this plugin) should be the option to &quot;bundle&quot; all search results from the same domain. Stuff them under one collapsible entry. I hate going through lists and pages of apple&#x2F;google&#x2F;synology&#x2F;sonos&#x2F;crab urls when I already know that I have to search somewhere else.
评论 #42713078 未加载
LeoPanthera4 months ago
It&#x27;s not going to be long before we need to move to a whitelist model, rather than a blacklist model.<p>It ironically makes me think of the Yahoo Web Directory in the 90s.<p>Time is a flat circle.
评论 #42710093 未加载
评论 #42712402 未加载
antithesis-nl4 months ago
So, if you already run uBlock Origin (and of course you are), you can use this list without installing any additional extensions by going to &#x27;Filter lists&#x27; in the uBlock settings, then Import, then enter <a href="https:&#x2F;&#x2F;raw.githubusercontent.com&#x2F;popcar2&#x2F;BadWebsiteBlocklist&#x2F;refs&#x2F;heads&#x2F;main&#x2F;uBlacklist.txt" rel="nofollow">https:&#x2F;&#x2F;raw.githubusercontent.com&#x2F;popcar2&#x2F;BadWebsiteBlocklis...</a> as the URL.<p>Not saying you <i>should</i>, just that you <i>could</i>...
评论 #42698392 未加载
gtfiorentino4 months ago
Hi @popcar2 — how are you sourcing the domains for the blocklist? We&#x27;d like to evaluate those domains and consider whether they should be removed from DuckDuckGo as spam. You can also report a site directly in the search results by clicking the three-dot menu next to the link and selecting &quot;Share Feedback about this Site&quot;.
评论 #42701022 未加载
评论 #42725403 未加载
james-bcn4 months ago
With the Kagi search engine is a way in the settings to bulk-upload lists of domains to block (or upvote) them. Has anyone uploaded a list like this to it?<p>I may do that.
评论 #42699347 未加载
评论 #42699744 未加载
shortformblog4 months ago
The problem with a list like this is that a “bad website” is in the eye of the beholder. I’m not saying that there’s anything wrong with you personally not liking the Shopify or the Semrush blog. But I think that everyone else has their own calculus.<p>It’s the same reason why social media blocklists can be problematic—everyone’s calculus is different.<p>My suggestion is that you promote it as a starter and suggest that users fork it for their own needs.
评论 #42700408 未加载
评论 #42710125 未加载
edm0nd4 months ago
I recently started a crypto scam&#x2F;phishing blocklist if you wanna roll these into your list as well.<p>also works well with Pi-hole and other platforms.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;spmedia&#x2F;Crypto-Scam-and-Crypto-Phishing-Threat-Intel-Feed">https:&#x2F;&#x2F;github.com&#x2F;spmedia&#x2F;Crypto-Scam-and-Crypto-Phishing-T...</a>
the_snooze4 months ago
This is one of those features a proper search engine (i.e., not a thinly-veiled advertising network) should have. If users can customize their search results and share their sorting&#x2F;filtering methods, then that presents a large number of constantly-moving targets that greatly drives up the cost of SEO. There&#x27;s no &quot;making the Google algorithm happy.&quot; Instead, it becomes more &quot;making the users happy.&quot;
评论 #42699196 未加载
Kuinox4 months ago
I don&#x27;t understand why so much corporate blogs are blocked. Most of them are about their product, or about the industry in general.<p>- For example, kaspersky blog doesn&#x27;t look bad.<p>- CCleaner blog is just a list of update.
评论 #42709677 未加载
评论 #42709698 未加载
MortyWaves4 months ago
Who’s going to be the first to make the PR for Medium and “dev.to”?
评论 #42699128 未加载
nayuki4 months ago
Related: Freya Holmér - &quot;Generative AI is a Parasitic Cancer&quot; <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=-opBifFfsMY" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=-opBifFfsMY</a> (1h19m54s) [2025-01-02].<p>She talks at length about how pages of AI-generated nonsense text are cluttering search results on Google and all other search engines.
评论 #42709563 未加载
ColdTakes4 months ago
DuckDuckGo and Kagi allow you to remove entire sites from search results and it is the best feature of these websites.
评论 #42698435 未加载
Night_Thastus4 months ago
I&#x27;ve been using GoogleHitHider, which also works on other search engines like DDG. Worked well for many years. It&#x27;s a list I curated myself though for personal use, I definitely wouldn&#x27;t mind seeing what other people had.
mrweasel4 months ago
I love that it just includes all of msn.com.
评论 #42699815 未加载
troyvit4 months ago
This is cool. It would be pretty easy to add the domains from this list to Kagi&#x27;s blocked domain list and have it integrated in the search without a plugin. The downside obviously is having to update that list from the repo, but still, as OP says, even with just a hundred domains blocked it&#x27;s already a big improvement.
lambdaone4 months ago
I think there&#x27;s big potential in using DNS blacklists for this: they have the advantage of being massively scalable and simple to maintain, and clients configuration to use them is also easy.<p>The scalability comes from the caching inherent in DNS; instead of having to have millions of people downloading text files from a website over HTTP on a regular basis, the data is in effect lazy-uploaded into the cloud of caching DNS resolvers, with no administration cost on behalf of the DNSBL operator.<p>Reputation whitelists (or other scoring services) would also be just as easy to implement.
评论 #42699265 未加载
noleary4 months ago
This is cool! Not entirely sure whether I think it&#x27;s a good idea, but I wonder if it&#x27;d be useful to come up with a way to tranche websites.<p>Some sites are complete garbage and should be blocked, for course. Others (e.g., in my experience, Quora) are sometimes quite good and sometimes quite bad. Wouldn&#x27;t be my first choice, but I&#x27;ve found them useful at times.<p>For a given search, maybe you try with the most aggressive blocking &#x2F; filtering. If you fail to find what you&#x27;re looking for, maybe soften the restriction a bit.<p>Maybe this is overwrought...
QuadrupleA4 months ago
One enraging thing, if some guy on GitHub can do this, why the F** can&#x27;t billion-dollar search giants put in a little human effort to do it too, right in their search engines?<p>SEO spam and AI slop are easily spotted on the human level. Google has hundreds of thousands of employees. Just put ONE of them on this f**ing job!<p>It&#x27;s criminal what these companies have let happen to the web.
评论 #42714657 未加载
ge964 months ago
Tangent, I may laughably use Malware Bytes but when I&#x27;m image searching on Google and it stops me from opening a picture with a adware alert. I&#x27;m like &quot;oh damn&quot;... I use an adblocker&#x2F;generally don&#x27;t do anything sus on my main OS but yeah. I&#x27;m still unsure am I safe? (paranoia ensues)<p>I use a VM in other scenarios but even that, properly separated?
theoreticalmal4 months ago
What on earth are people still searching for using search engines? I’ve found chatGPT to be significantly better at answering question I have than google or DDG or any other search engine. It’s still AI slop, but at least it’s a bit more succinct, and I can ask follow up questions
miyuru4 months ago
Brave has goggles that do exactly this. you can even share the list with others.<p><a href="https:&#x2F;&#x2F;search.brave.com&#x2F;goggles&#x2F;discover" rel="nofollow">https:&#x2F;&#x2F;search.brave.com&#x2F;goggles&#x2F;discover</a>
loa_observer4 months ago
everytime i search content about supabase, some trash ai generated content website like restack shows and waste my time. I am not saying restack is bad, but a customizable blocker to block the site for specific topic might be good for me.
dmix4 months ago
does the msn.com one block their news site?
swayvil4 months ago
How do you ensure good contributors and good contributions?<p>Do you have a forum where you discuss prospective contributions etc?
评论 #42700429 未加载
batata_frita4 months ago
Does anybody know if is it possible to apply a similar configuration in a searxng instance?
renegat0x04 months ago
I think it could also be accomplished using searxng, and blocking it there.
mediumsmart4 months ago
hosts with tens of thousands of entries, kagi for search and recipes from the spammer godsend Llm in librewolf is still an option but no idea for how long.
lubujackson4 months ago
download.cnet.com serves up spam nowadays? How far the mighty have fallen.
Animats4 months ago
Does Google still allow that in an add-on?
qiine4 months ago
Thank you for your service
purpleinfs4 months ago
nice work
sandropuppo4 months ago
What about just using perplexity? It&#x27;s already doing that I think.
评论 #42698133 未加载