The first thing I setup when I started to manage my own Kubernetes cluster more then a year ago was this Warrior, I completely forgot about it until this post.<p>Has been active for over a year steadily working the recommended project. Downloaded over 3TB in 6 days (node reboot, so pod was restarted and stats are not persistent). So rough extrapolation is about 180TB. Happy to help the good cause of the ArchiveTeam!<p>Edit: typo
For anyone else interested in running this, it only took a couple seconds to launch their docker-compose.yml<p><a href="https://github.com/ArchiveTeam/warrior-dockerfile/blob/master/docker-compose.yml">https://github.com/ArchiveTeam/warrior-dockerfile/blob/maste...</a>
Many of these sites are already captured and archived by proper entities as required by federal law. More is better, I guess, except when it isn't. Duplication of effort is a huge problem in the humanities in general and with archiving in particular.<p>The whole concept needs to be rethought. Captures from these tools show up under "ArchiveTeam" which is currently pumping thousands of copies of the Google Home Page into the Wayback Machine every week. Or at least trying to.<p><a href="https://web.archive.org/web/20250122000033/www.google.com" rel="nofollow">https://web.archive.org/web/20250122000033/www.google.com</a><p>Like so many things about archive.org, when you dig in you start to find wonder and craziness at every turn.