TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Running ArchiveTeam's Warrior in Kubernetes

92 pointsby gmemstr3 months ago

4 comments

WildGreenLeave3 months ago
The first thing I setup when I started to manage my own Kubernetes cluster more then a year ago was this Warrior, I completely forgot about it until this post.<p>Has been active for over a year steadily working the recommended project. Downloaded over 3TB in 6 days (node reboot, so pod was restarted and stats are not persistent). So rough extrapolation is about 180TB. Happy to help the good cause of the ArchiveTeam!<p>Edit: typo
ch71r223 months ago
For anyone else interested in running this, it only took a couple seconds to launch their docker-compose.yml<p><a href="https:&#x2F;&#x2F;github.com&#x2F;ArchiveTeam&#x2F;warrior-dockerfile&#x2F;blob&#x2F;master&#x2F;docker-compose.yml">https:&#x2F;&#x2F;github.com&#x2F;ArchiveTeam&#x2F;warrior-dockerfile&#x2F;blob&#x2F;maste...</a>
评论 #42954280 未加载
评论 #42965313 未加载
Havoc3 months ago
Isn&#x27;t there substantial risk involved in having who knows what scraped from your IP?
评论 #42966320 未加载
badlibrarian3 months ago
Many of these sites are already captured and archived by proper entities as required by federal law. More is better, I guess, except when it isn&#x27;t. Duplication of effort is a huge problem in the humanities in general and with archiving in particular.<p>The whole concept needs to be rethought. Captures from these tools show up under &quot;ArchiveTeam&quot; which is currently pumping thousands of copies of the Google Home Page into the Wayback Machine every week. Or at least trying to.<p><a href="https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20250122000033&#x2F;www.google.com" rel="nofollow">https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20250122000033&#x2F;www.google.com</a><p>Like so many things about archive.org, when you dig in you start to find wonder and craziness at every turn.
评论 #42953413 未加载
评论 #42953468 未加载
评论 #42953732 未加载