TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Running ArchiveTeam's Warrior in Kubernetes

92 点作者 gmemstr4 个月前

4 条评论

WildGreenLeave4 个月前
The first thing I setup when I started to manage my own Kubernetes cluster more then a year ago was this Warrior, I completely forgot about it until this post.<p>Has been active for over a year steadily working the recommended project. Downloaded over 3TB in 6 days (node reboot, so pod was restarted and stats are not persistent). So rough extrapolation is about 180TB. Happy to help the good cause of the ArchiveTeam!<p>Edit: typo
ch71r224 个月前
For anyone else interested in running this, it only took a couple seconds to launch their docker-compose.yml<p><a href="https:&#x2F;&#x2F;github.com&#x2F;ArchiveTeam&#x2F;warrior-dockerfile&#x2F;blob&#x2F;master&#x2F;docker-compose.yml">https:&#x2F;&#x2F;github.com&#x2F;ArchiveTeam&#x2F;warrior-dockerfile&#x2F;blob&#x2F;maste...</a>
评论 #42954280 未加载
评论 #42965313 未加载
Havoc4 个月前
Isn&#x27;t there substantial risk involved in having who knows what scraped from your IP?
评论 #42966320 未加载
badlibrarian4 个月前
Many of these sites are already captured and archived by proper entities as required by federal law. More is better, I guess, except when it isn&#x27;t. Duplication of effort is a huge problem in the humanities in general and with archiving in particular.<p>The whole concept needs to be rethought. Captures from these tools show up under &quot;ArchiveTeam&quot; which is currently pumping thousands of copies of the Google Home Page into the Wayback Machine every week. Or at least trying to.<p><a href="https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20250122000033&#x2F;www.google.com" rel="nofollow">https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20250122000033&#x2F;www.google.com</a><p>Like so many things about archive.org, when you dig in you start to find wonder and craziness at every turn.
评论 #42953413 未加载
评论 #42953468 未加载
评论 #42953732 未加载