TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Building a Dark Web Crawler in Go

283 pointsby aadlaniover 5 years ago

13 comments

bureaucratover 5 years ago
First of all, it’s hidden sevices, not dark web.<p>Second, to anyone crawling hidden services or crawling over tor, please run a relay or decrease your hop. Don’t sacrifice other’s desperate need for anonymity for your $whatever_purpose_thats_probably_not_important. It could be some fun thing to do for you, but some people are relying on tor to use the free, secure and anonymous Internet.
评论 #21047167 未加载
评论 #21049147 未加载
评论 #21048219 未加载
评论 #21047064 未加载
评论 #21047663 未加载
评论 #21047487 未加载
评论 #21047458 未加载
评论 #21047996 未加载
评论 #21049552 未加载
Hittonover 5 years ago
Disclaimer: I have rather small experience with Golang and just skimmed the crawler code.<p>From what I could see, author made effort to make the crawler distributed with k8s (which I don&#x27;t is needed considering there are only approximately 75 000 onion addresses) using modern buzzword technology, but from what I could see the crawler itself is rather simplistic. It doesn&#x27;t even seem to index&#x2F;crawl relative urls, just absolute ones.
评论 #21047875 未加载
评论 #21047358 未加载
jmnicolasover 5 years ago
I&#x27;d be concerned that the DB is going to contain some pretty nasty stuff that might be hard to explain in front of a judge.
评论 #21047375 未加载
评论 #21049876 未加载
评论 #21047292 未加载
mschuster91over 5 years ago
To anyone experimenting with such stuff, <i>take care</i> and don&#x27;t make your services publically available. Especially the dark web is full with highly illegal content such as child pornography and in some jurisdictions even &quot;involuntary possession&quot; such as in browser caches may be enough to convict you.
评论 #21049509 未加载
rolltiideover 5 years ago
I’ve been pretty surprised at how big hidden services have become<p>Dread, the dark net reddit, is surprisingly vibrant<p>I think its weird that people almost don&#x27;t <i>want</i> to hear positive stories about dark net.<p>It’ll be funny when news articles and romcoms just start “forgetting” to qualify their plot piece with the “its scary” trope
评论 #21047446 未加载
zhdc1over 5 years ago
Crawlers are fun!<p>If you&#x27;re new to the field and want something that&#x27;s easy to set up &amp; polite, I strongly recommend Apache Storm Crawler (<a href="https:&#x2F;&#x2F;github.com&#x2F;DigitalPebble&#x2F;storm-crawler" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;DigitalPebble&#x2F;storm-crawler</a>).
sbmthakurover 5 years ago
A well written article with lot of technical details. Well done.<p>However, I&#x27;m wondering what would be a good practical purpose of crawling dark web.
评论 #21047118 未加载
seisvelasover 5 years ago
I did the same in Racket when I made a Tor search engine. Here&#x27;s the source code of the crawler!<p><a href="https:&#x2F;&#x2F;github.com&#x2F;torgle&#x2F;torgle&#x2F;blob&#x2F;master&#x2F;backend&#x2F;torgle.rkt" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;torgle&#x2F;torgle&#x2F;blob&#x2F;master&#x2F;backend&#x2F;torgle....</a>
fs111over 5 years ago
Any http-aware software that supports socks proxies can access information on hidden services, so any crawler can do it. I fail to see what is novel about that, except that it uses k8s and mongo and a catchy blog title.
woodandsteelover 5 years ago
So how well would this thing work? What I am asking is what percentage of all the tor hidden service sites out there would get detected by it?
goatsiover 5 years ago
How well does it handle a gzip bomb? <a href="https:&#x2F;&#x2F;www.hackerfactor.com&#x2F;blog&#x2F;index.php?&#x2F;archives&#x2F;762-Attacked-Over-Tor.html" rel="nofollow">https:&#x2F;&#x2F;www.hackerfactor.com&#x2F;blog&#x2F;index.php?&#x2F;archives&#x2F;762-At...</a>
Havocover 5 years ago
Sounds like a recipe to score yourself a free FBI visit
评论 #21048839 未加载
getpolarizedover 5 years ago
Go is a horrible language in which to write a crawler. The main problem is that NLP and machine learning code simply isn&#x27;t as prevalent and robust as it is in Java and Python.
评论 #21051150 未加载