TechEcho

4 comments

The first thing I setup when I started to manage my own Kubernetes cluster more then a year ago was this Warrior, I completely forgot about it until this post.Has been active for over a year steadily working the recommended project. Downloaded over 3TB in 6 days (node reboot, so pod was restarted and stats are not persistent). So rough extrapolation is about 180TB. Happy to help the good cause of the ArchiveTeam!Edit: typo

ch71r223 months ago

For anyone else interested in running this, it only took a couple seconds to launch their docker-compose.yml<a href="https://github.com/ArchiveTeam/warrior-dockerfile/blob/master/docker-compose.yml">https://github.com/ArchiveTeam/warrior-dockerfile/blob/maste...</a>

评论 #42954280 未加载

评论 #42965313 未加载

Havoc3 months ago

Isn't there substantial risk involved in having who knows what scraped from your IP?

评论 #42966320 未加载

badlibrarian3 months ago

Many of these sites are already captured and archived by proper entities as required by federal law. More is better, I guess, except when it isn't. Duplication of effort is a huge problem in the humanities in general and with archiving in particular.The whole concept needs to be rethought. Captures from these tools show up under "ArchiveTeam" which is currently pumping thousands of copies of the Google Home Page into the Wayback Machine every week. Or at least trying to.<a href="https://web.archive.org/web/20250122000033/www.google.com" rel="nofollow">https://web.archive.org/web/20250122000033/www.google.com</a>Like so many things about archive.org, when you dig in you start to find wonder and craziness at every turn.

评论 #42953413 未加载

评论 #42953468 未加载

评论 #42953732 未加载

4 comments

WildGreenLeave3 months ago

ch71r223 months ago

评论 #42954280 未加载

评论 #42965313 未加载

Havoc3 months ago

Isn't there substantial risk involved in having who knows what scraped from your IP?

Running ArchiveTeam's Warrior in Kubernetes

4 comments

Running ArchiveTeam's Warrior in Kubernetes

4 comments