I feel like a simple automatic capture of timestamp + url + screenshot would already be very useful. This gives you a visual memory of the things you've seen on the web. I've wanted to develop this for a while, as a browser plugin.<p>Being able to skim the past month or two click around the thumbnails would already be amazing. I've wanted to do that many times before to check if my memory was correct, or if a page changed since I last saw it, or figure out when I last saw something online.<p>You don't need a special viewer for it, as your operating system's file explorer can view the screenshots already, and you don't need to set up a crawl. Screenshots also compress well, as webp or png after crunching it.
This article is blogspam.<p>The repository has enough information on its own: <a href="https://github.com/ArchiveBox/ArchiveBox" rel="nofollow">https://github.com/ArchiveBox/ArchiveBox</a>
Interesting side note:<p>It seems like a lot of people in this thread have an interest in retaining a "replayable timeline" of their own browsing/reading history.<p>There's probably enough support here to gather a few contributors for an open source project.
Hey all, @pirate (ArchiveBox maintainer) here, thanks for posting this @adamhearn.<p>If you like ArchiveBox check out our new Twitter account for the project, <a href="https://twitter.com/ArchiveBoxApp" rel="nofollow">https://twitter.com/ArchiveBoxApp</a> we just opened it and we'll be posting announcements and prerelease sneak-peeks on there in the future.
Quote: "..even if you instruct it to begin archiving a site then it can easily fail if that site’s robots.txt prevents crawling"<p>Huh? Does actually the big corporations care anymore about robots.txt? Nowadays is more of a "netiquette" than anything else. Google definitely ignores it. Dunno DuckDucGo what it does
How long until this is a feature baked into a mainstream web browser? Archive, prefetch, cache, all variants on a theme. History, bookmarks, local search engine, all the same.
Is there a list of web page archive formats I could look at? There are a few things I'd love to do where it would be very handy to have one file per page
Tried this a while ago, disappointed at HD usage.<p>My solution as heavy TTS user who has balabolka setup to read copied text which naturally leaves a log for future reference. There's extentions to auto copy highlighted text and append urls which makes entire flow straight forward. Log each day is around 1-5mbs of text saved in a big folder. Biggest limitation is trying to advance search unstructured text files by complex keywords within dates. I'm sure I can setup each clip with delimiters so logs can be imported into a searchable DB, just too lazy.
How does archive.is trick news sites into showing content without the paywall? Is it pure user agent spoofing?<p>I'm wondering if this could be applied here.