TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

ArchiveBox: Open-source self-hosted web archiving

250 点作者 mieubrisse超过 1 年前

16 条评论

nikisweeting超过 1 年前
Thanks for posting @mieubrisse! I haven&#x27;t posted it on HN myself in a long time but I just released ArchiveBox v0.7.2 a couple days ago, so it&#x27;s great timing.<p>I encourage people to also check out the list of ArchiveBox alternatives we maintain if ArchiveBox doesn&#x27;t quite fit your needs.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;ArchiveBox&#x2F;ArchiveBox&#x2F;wiki&#x2F;Web-Archiving-Community#other-archivebox-alternatives">https:&#x2F;&#x2F;github.com&#x2F;ArchiveBox&#x2F;ArchiveBox&#x2F;wiki&#x2F;Web-Archiving-...</a>
评论 #38963430 未加载
评论 #38963304 未加载
评论 #38979469 未加载
jrm4超过 1 年前
Love this. That being said, I tried a bunch of these and landed on Shiori; I think my take was that ArchiveBox is great if you definitely want options and be comprehensive, but if you&#x27;re mostly just going for the articles and text and want something simpler, this is it. (I teach at a college and don&#x27;t want to lose good articles, and also gives me some nice uniform formatting)<p><a href="https:&#x2F;&#x2F;github.com&#x2F;go-shiori&#x2F;shiori">https:&#x2F;&#x2F;github.com&#x2F;go-shiori&#x2F;shiori</a>
kornhole超过 1 年前
I spun up my own Archivebox after archive.org wouldn&#x27;t let me archive some news stories and I heard about them removing other content. Instead of calling the Internet Archive the wayback machine, I now call it the maybe back machine. IA is a centralized service and subject to the government and other powerful pressures any centralized popular service faces. If you want to archive something that might now or in future want to be erased by people in power, you should decentralize it to somewhere like an archivebox. This is especially useful if you are writing a book with many citations.
评论 #38961357 未加载
评论 #38961181 未加载
评论 #38960554 未加载
abound超过 1 年前
This is uncanny, I just discovered ArchiveBox earlier today and set up a self-hosted instance on some home hardware for a collection of bookmarks of useful guides, tutorials, and references I&#x27;ve collected over the years.<p>Setting it up on K8s with sonic [1] as the search backend and importing a few hundred URLs only took ~an hour or so, and the cached pages look great for the most part.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;valeriansaliou&#x2F;sonic">https:&#x2F;&#x2F;github.com&#x2F;valeriansaliou&#x2F;sonic</a>
tardisx超过 1 年前
I looked at ArchiveBox and several similar projects a while ago, but realised I didn&#x27;t want anything so complex. I just wanted bookmarks, with free-text content search so I could find something again based on more than just a title.<p>So I wrote my own: <a href="https:&#x2F;&#x2F;github.com&#x2F;tardisx&#x2F;linkwallet">https:&#x2F;&#x2F;github.com&#x2F;tardisx&#x2F;linkwallet</a><p>Emphasis on tiny system requirements and dependancies (single binary, no service dependencies). As a consequence the text indexing is very basic (basic HTML scrape). But it&#x27;s working for me :-)
评论 #38987165 未加载
parasti超过 1 年前
The screenshot section single-handedly breaks mobile UX due to overflow.
评论 #39007181 未加载
dundarious超过 1 年前
I researched various archiving alternatives for something I needed recently. I subscribe to a paid Substack for an educational course that will end mid-year, and I want to archive the course posts before it ends (the course provider has even recommended people end their Substack subscription after it ends).<p>For this purpose, I found the SingleFile browser extension to be the best fit. It&#x27;s a browser extension, so paywall cookies are already present, and I just manually archive the previous week&#x27;s content, <i>after</i> the discussion phase has concluded. It creates a single self-contained file with all images and comments, etc., but all non-page-local links still resolve externally (which is as-desired, for my use case). It can be configured to auto-generate a convenient filename, and to use self-extracting compression.<p>I preferred this to an automated process based on, e.g., RSS, because I can ensure the archive occurs <i>after</i> all the useful course comments back-and-forth has concluded, and it&#x27;s trivial to set up and use.
评论 #38961478 未加载
评论 #38961378 未加载
评论 #38964506 未加载
评论 #38960559 未加载
dtkav超过 1 年前
I also came across ArchiveBox a few days ago to see if I should migrate off my home-grown solution with Puppeteer, SingleFile &amp; readability.js.<p>I&#x27;ve been working on getting it deployed to fly.io with LSVD so it can scale to zero while storing everything on an S3-backed volume as described here[0].<p>My biggest disappointment so far is that it seems like a fairly large lift to make ublock origin work because extensions don&#x27;t work in headless chrome (?). It seems like using pihole is current best method to block ads [1].<p>[0] <a href="https:&#x2F;&#x2F;community.fly.io&#x2F;t&#x2F;bottomless-s3-backed-volumes&#x2F;15648">https:&#x2F;&#x2F;community.fly.io&#x2F;t&#x2F;bottomless-s3-backed-volumes&#x2F;1564...</a> [1] <a href="https:&#x2F;&#x2F;github.com&#x2F;ArchiveBox&#x2F;ArchiveBox&#x2F;issues&#x2F;211">https:&#x2F;&#x2F;github.com&#x2F;ArchiveBox&#x2F;ArchiveBox&#x2F;issues&#x2F;211</a>
评论 #39007210 未加载
loceng超过 1 年前
Are there any figures available anywhere as to how many people actively-passively maintain a personal-private archive?
评论 #38961316 未加载
keepamovin超过 1 年前
For anyone who uses Chrome and wants to view their archived pages in the browser as if they were still online (URL and everything intact), and also full-text search through their browsing history that was archived (like AB plans to add in future, I think, right nikki?) you can check out DownloadNet: <a href="https:&#x2F;&#x2F;github.com&#x2F;dosyago&#x2F;DownloadNet">https:&#x2F;&#x2F;github.com&#x2F;dosyago&#x2F;DownloadNet</a><p>You can have multiple archives, and even use a mode where you only archive pages you bookmark rather than everything.
rgomez超过 1 年前
Last year I&#x27;ve been working in a Golang open source tool with a more modest approach by now (just command line) but with a similar goal (to keep personal info), in my tool formats are described using simple YAML templates and stored in a sqlite db file (<a href="https:&#x2F;&#x2F;github.com&#x2F;khromalabs&#x2F;keeper">https:&#x2F;&#x2F;github.com&#x2F;khromalabs&#x2F;keeper</a>), glad to know about more open source tools exploring similar ideas.
dugite-code超过 1 年前
ArchiveBox is a great bit of kit and I&#x27;ve been using it for a while, I&#x27;m currently ingesting my browser bookmarks from Nextcloud bookmarks (using floccus sync from my browser) via RSS. That said, even though it&#x27;s archiving features a poorer, I&#x27;ve been looking in using linkwarden for the partner approval factor and better integration with my SSO setup.
A4ET8a8uTh0超过 1 年前
For those who want to test in unraid and run into root issue after initial setup:<p><a href="https:&#x2F;&#x2F;3xn.nl&#x2F;projects&#x2F;category&#x2F;unraid&#x2F;" rel="nofollow">https:&#x2F;&#x2F;3xn.nl&#x2F;projects&#x2F;category&#x2F;unraid&#x2F;</a><p>First time user, but its one of those things I did not know I wanted.
评论 #38964023 未加载
theK超过 1 年前
This is awesome, I couldn&#x27;t identify from the readme how you tell it what to save and was wondering whether this could be driven by a Browser add-on&#x2F;extension?
评论 #39007231 未加载
CrypticShift超过 1 年前
This is one of those great projects that would benefit from local LLM integration.
评论 #38964007 未加载
valsk超过 1 年前
This was created 5 years ago..
评论 #38961593 未加载
评论 #38960072 未加载