TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: A blocklist to remove spam and bad websites from search results

225 点作者 popcar24 个月前
Hi HN!<p>I&#x27;ve been fed up with search results so much that I decided to make a giant blocklist to remove garbage links by using uBlacklist.<p>I browsed other blocklists and wasn&#x27;t very satisfied from what exists now; the goal of this one is to be super organized and transparent, explaining why each site was blocked via issues. Contributions welcome!<p>Even though around 100 domains are blocked so far, I already noticed a big improvement in casual searches. You&#x27;d be surprised how some AI generated websites can dominate the #1 page on DuckDuckGo.

33 条评论

cormorant4 个月前
I&#x27;m fed up too. Spammy, AI-looking sites are showing up more and more. For some reason, many of them use the same Wordpress theme with a light gray table of contents - they look like this: <a href="https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;totally-not-ai-generated-efsumgZ" rel="nofollow">https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;totally-not-ai-generated-efsumgZ</a><p>The problem seems worse on &quot;alternative&quot; search engines, e.g. DuckDuckGo and Kagi, which both use Bing. It&#x27;s been driving me back to Google.<p>A blocklist seems like a losing proposition, unless, like adblock filter lists, it balloons to tens of thousands of entries and gets updated constantly.<p>Unfortunately, this kind of blocklist is highly subjective. This list blocks MSN.com! That&#x27;s hardly what I would have chosen.
评论 #42698286 未加载
评论 #42699974 未加载
评论 #42708022 未加载
评论 #42743367 未加载
评论 #42701060 未加载
Ringz4 个月前
Installed! This should not be a function of the search engine nor a plugin. This should be integrated in the browser.<p>Another great function (not for this plugin) should be the option to &quot;bundle&quot; all search results from the same domain. Stuff them under one collapsible entry. I hate going through lists and pages of apple&#x2F;google&#x2F;synology&#x2F;sonos&#x2F;crab urls when I already know that I have to search somewhere else.
评论 #42713078 未加载
LeoPanthera4 个月前
It&#x27;s not going to be long before we need to move to a whitelist model, rather than a blacklist model.<p>It ironically makes me think of the Yahoo Web Directory in the 90s.<p>Time is a flat circle.
评论 #42710093 未加载
评论 #42712402 未加载
antithesis-nl4 个月前
So, if you already run uBlock Origin (and of course you are), you can use this list without installing any additional extensions by going to &#x27;Filter lists&#x27; in the uBlock settings, then Import, then enter <a href="https:&#x2F;&#x2F;raw.githubusercontent.com&#x2F;popcar2&#x2F;BadWebsiteBlocklist&#x2F;refs&#x2F;heads&#x2F;main&#x2F;uBlacklist.txt" rel="nofollow">https:&#x2F;&#x2F;raw.githubusercontent.com&#x2F;popcar2&#x2F;BadWebsiteBlocklis...</a> as the URL.<p>Not saying you <i>should</i>, just that you <i>could</i>...
评论 #42698392 未加载
gtfiorentino4 个月前
Hi @popcar2 — how are you sourcing the domains for the blocklist? We&#x27;d like to evaluate those domains and consider whether they should be removed from DuckDuckGo as spam. You can also report a site directly in the search results by clicking the three-dot menu next to the link and selecting &quot;Share Feedback about this Site&quot;.
评论 #42701022 未加载
评论 #42725403 未加载
james-bcn4 个月前
With the Kagi search engine is a way in the settings to bulk-upload lists of domains to block (or upvote) them. Has anyone uploaded a list like this to it?<p>I may do that.
评论 #42699347 未加载
评论 #42699744 未加载
shortformblog4 个月前
The problem with a list like this is that a “bad website” is in the eye of the beholder. I’m not saying that there’s anything wrong with you personally not liking the Shopify or the Semrush blog. But I think that everyone else has their own calculus.<p>It’s the same reason why social media blocklists can be problematic—everyone’s calculus is different.<p>My suggestion is that you promote it as a starter and suggest that users fork it for their own needs.
评论 #42700408 未加载
评论 #42710125 未加载
edm0nd4 个月前
I recently started a crypto scam&#x2F;phishing blocklist if you wanna roll these into your list as well.<p>also works well with Pi-hole and other platforms.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;spmedia&#x2F;Crypto-Scam-and-Crypto-Phishing-Threat-Intel-Feed">https:&#x2F;&#x2F;github.com&#x2F;spmedia&#x2F;Crypto-Scam-and-Crypto-Phishing-T...</a>
the_snooze4 个月前
This is one of those features a proper search engine (i.e., not a thinly-veiled advertising network) should have. If users can customize their search results and share their sorting&#x2F;filtering methods, then that presents a large number of constantly-moving targets that greatly drives up the cost of SEO. There&#x27;s no &quot;making the Google algorithm happy.&quot; Instead, it becomes more &quot;making the users happy.&quot;
评论 #42699196 未加载
Kuinox4 个月前
I don&#x27;t understand why so much corporate blogs are blocked. Most of them are about their product, or about the industry in general.<p>- For example, kaspersky blog doesn&#x27;t look bad.<p>- CCleaner blog is just a list of update.
评论 #42709677 未加载
评论 #42709698 未加载
MortyWaves4 个月前
Who’s going to be the first to make the PR for Medium and “dev.to”?
评论 #42699128 未加载
nayuki4 个月前
Related: Freya Holmér - &quot;Generative AI is a Parasitic Cancer&quot; <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=-opBifFfsMY" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=-opBifFfsMY</a> (1h19m54s) [2025-01-02].<p>She talks at length about how pages of AI-generated nonsense text are cluttering search results on Google and all other search engines.
评论 #42709563 未加载
ColdTakes4 个月前
DuckDuckGo and Kagi allow you to remove entire sites from search results and it is the best feature of these websites.
评论 #42698435 未加载
Night_Thastus4 个月前
I&#x27;ve been using GoogleHitHider, which also works on other search engines like DDG. Worked well for many years. It&#x27;s a list I curated myself though for personal use, I definitely wouldn&#x27;t mind seeing what other people had.
mrweasel4 个月前
I love that it just includes all of msn.com.
评论 #42699815 未加载
troyvit4 个月前
This is cool. It would be pretty easy to add the domains from this list to Kagi&#x27;s blocked domain list and have it integrated in the search without a plugin. The downside obviously is having to update that list from the repo, but still, as OP says, even with just a hundred domains blocked it&#x27;s already a big improvement.
lambdaone4 个月前
I think there&#x27;s big potential in using DNS blacklists for this: they have the advantage of being massively scalable and simple to maintain, and clients configuration to use them is also easy.<p>The scalability comes from the caching inherent in DNS; instead of having to have millions of people downloading text files from a website over HTTP on a regular basis, the data is in effect lazy-uploaded into the cloud of caching DNS resolvers, with no administration cost on behalf of the DNSBL operator.<p>Reputation whitelists (or other scoring services) would also be just as easy to implement.
评论 #42699265 未加载
noleary4 个月前
This is cool! Not entirely sure whether I think it&#x27;s a good idea, but I wonder if it&#x27;d be useful to come up with a way to tranche websites.<p>Some sites are complete garbage and should be blocked, for course. Others (e.g., in my experience, Quora) are sometimes quite good and sometimes quite bad. Wouldn&#x27;t be my first choice, but I&#x27;ve found them useful at times.<p>For a given search, maybe you try with the most aggressive blocking &#x2F; filtering. If you fail to find what you&#x27;re looking for, maybe soften the restriction a bit.<p>Maybe this is overwrought...
QuadrupleA4 个月前
One enraging thing, if some guy on GitHub can do this, why the F** can&#x27;t billion-dollar search giants put in a little human effort to do it too, right in their search engines?<p>SEO spam and AI slop are easily spotted on the human level. Google has hundreds of thousands of employees. Just put ONE of them on this f**ing job!<p>It&#x27;s criminal what these companies have let happen to the web.
评论 #42714657 未加载
ge964 个月前
Tangent, I may laughably use Malware Bytes but when I&#x27;m image searching on Google and it stops me from opening a picture with a adware alert. I&#x27;m like &quot;oh damn&quot;... I use an adblocker&#x2F;generally don&#x27;t do anything sus on my main OS but yeah. I&#x27;m still unsure am I safe? (paranoia ensues)<p>I use a VM in other scenarios but even that, properly separated?
theoreticalmal4 个月前
What on earth are people still searching for using search engines? I’ve found chatGPT to be significantly better at answering question I have than google or DDG or any other search engine. It’s still AI slop, but at least it’s a bit more succinct, and I can ask follow up questions
miyuru4 个月前
Brave has goggles that do exactly this. you can even share the list with others.<p><a href="https:&#x2F;&#x2F;search.brave.com&#x2F;goggles&#x2F;discover" rel="nofollow">https:&#x2F;&#x2F;search.brave.com&#x2F;goggles&#x2F;discover</a>
loa_observer4 个月前
everytime i search content about supabase, some trash ai generated content website like restack shows and waste my time. I am not saying restack is bad, but a customizable blocker to block the site for specific topic might be good for me.
dmix4 个月前
does the msn.com one block their news site?
swayvil4 个月前
How do you ensure good contributors and good contributions?<p>Do you have a forum where you discuss prospective contributions etc?
评论 #42700429 未加载
batata_frita4 个月前
Does anybody know if is it possible to apply a similar configuration in a searxng instance?
renegat0x04 个月前
I think it could also be accomplished using searxng, and blocking it there.
mediumsmart4 个月前
hosts with tens of thousands of entries, kagi for search and recipes from the spammer godsend Llm in librewolf is still an option but no idea for how long.
lubujackson4 个月前
download.cnet.com serves up spam nowadays? How far the mighty have fallen.
Animats4 个月前
Does Google still allow that in an add-on?
qiine4 个月前
Thank you for your service
purpleinfs4 个月前
nice work
sandropuppo4 个月前
What about just using perplexity? It&#x27;s already doing that I think.
评论 #42698133 未加载