TechEcho

13 comments

bureaucratover 5 years ago

First of all, it’s hidden sevices, not dark web.Second, to anyone crawling hidden services or crawling over tor, please run a relay or decrease your hop. Don’t sacrifice other’s desperate need for anonymity for your $whatever_purpose_thats_probably_not_important. It could be some fun thing to do for you, but some people are relying on tor to use the free, secure and anonymous Internet.

评论 #21047167 未加载

评论 #21049147 未加载

评论 #21048219 未加载

评论 #21047064 未加载

评论 #21047663 未加载

评论 #21047487 未加载

评论 #21047458 未加载

评论 #21047996 未加载

评论 #21049552 未加载

Hittonover 5 years ago

Disclaimer: I have rather small experience with Golang and just skimmed the crawler code.From what I could see, author made effort to make the crawler distributed with k8s (which I don't is needed considering there are only approximately 75 000 onion addresses) using modern buzzword technology, but from what I could see the crawler itself is rather simplistic. It doesn't even seem to index/crawl relative urls, just absolute ones.

评论 #21047875 未加载

评论 #21047358 未加载

jmnicolasover 5 years ago

I'd be concerned that the DB is going to contain some pretty nasty stuff that might be hard to explain in front of a judge.

评论 #21047375 未加载

评论 #21049876 未加载

评论 #21047292 未加载

mschuster91over 5 years ago

To anyone experimenting with such stuff, take care and don't make your services publically available. Especially the dark web is full with highly illegal content such as child pornography and in some jurisdictions even "involuntary possession" such as in browser caches may be enough to convict you.

评论 #21049509 未加载

rolltiideover 5 years ago

I’ve been pretty surprised at how big hidden services have becomeDread, the dark net reddit, is surprisingly vibrantI think its weird that people almost don't want to hear positive stories about dark net.It’ll be funny when news articles and romcoms just start “forgetting” to qualify their plot piece with the “its scary” trope

评论 #21047446 未加载

zhdc1over 5 years ago

Crawlers are fun!If you're new to the field and want something that's easy to set up & polite, I strongly recommend Apache Storm Crawler (<a href="https://github.com/DigitalPebble/storm-crawler" rel="nofollow">https://github.com/DigitalPebble/storm-crawler</a>).

sbmthakurover 5 years ago

A well written article with lot of technical details. Well done.However, I'm wondering what would be a good practical purpose of crawling dark web.

评论 #21047118 未加载

seisvelasover 5 years ago

I did the same in Racket when I made a Tor search engine. Here's the source code of the crawler!<a href="https://github.com/torgle/torgle/blob/master/backend/torgle.rkt" rel="nofollow">https://github.com/torgle/torgle/blob/master/backend/torgle....</a>

fs111over 5 years ago

Any http-aware software that supports socks proxies can access information on hidden services, so any crawler can do it. I fail to see what is novel about that, except that it uses k8s and mongo and a catchy blog title.

woodandsteelover 5 years ago

So how well would this thing work? What I am asking is what percentage of all the tor hidden service sites out there would get detected by it?

goatsiover 5 years ago

How well does it handle a gzip bomb? <a href="https://www.hackerfactor.com/blog/index.php?/archives/762-Attacked-Over-Tor.html" rel="nofollow">https://www.hackerfactor.com/blog/index.php?/archives/762-At...</a>

Havocover 5 years ago

Sounds like a recipe to score yourself a free FBI visit

评论 #21048839 未加载

getpolarizedover 5 years ago

Go is a horrible language in which to write a crawler. The main problem is that NLP and machine learning code simply isn't as prevalent and robust as it is in Java and Python.

评论 #21051150 未加载

13 comments

bureaucratover 5 years ago

评论 #21047167 未加载

评论 #21049147 未加载

评论 #21048219 未加载

评论 #21047064 未加载

评论 #21047663 未加载

评论 #21047487 未加载

评论 #21047458 未加载

评论 #21047996 未加载

评论 #21049552 未加载

Hittonover 5 years ago

评论 #21047875 未加载

评论 #21047358 未加载

jmnicolasover 5 years ago

I'd be concerned that the DB is going to contain some pretty nasty stuff that might be hard to explain in front of a judge.

评论 #21047375 未加载

评论 #21049876 未加载

评论 #21047292 未加载

mschuster91over 5 years ago

评论 #21049509 未加载

rolltiideover 5 years ago

评论 #21047446 未加载

zhdc1over 5 years ago

sbmthakurover 5 years ago

A well written article with lot of technical details. Well done.However, I'm wondering what would be a good practical purpose of crawling dark web.

评论 #21047118 未加载

seisvelasover 5 years ago

fs111over 5 years ago

woodandsteelover 5 years ago

So how well would this thing work? What I am asking is what percentage of all the tor hidden service sites out there would get detected by it?

goatsiover 5 years ago

Havocover 5 years ago

Sounds like a recipe to score yourself a free FBI visit

评论 #21048839 未加载

getpolarizedover 5 years ago

Go is a horrible language in which to write a crawler. The main problem is that NLP and machine learning code simply isn't as prevalent and robust as it is in Java and Python.

评论 #21051150 未加载

Building a Dark Web Crawler in Go

13 comments

Building a Dark Web Crawler in Go

13 comments