TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Make Your Own Internet Archive with Archive Box

257 pointsby adamhearnover 4 years ago

17 comments

lazyjeffover 4 years ago
I feel like a simple automatic capture of timestamp + url + screenshot would already be very useful. This gives you a visual memory of the things you&#x27;ve seen on the web. I&#x27;ve wanted to develop this for a while, as a browser plugin.<p>Being able to skim the past month or two click around the thumbnails would already be amazing. I&#x27;ve wanted to do that many times before to check if my memory was correct, or if a page changed since I last saw it, or figure out when I last saw something online.<p>You don&#x27;t need a special viewer for it, as your operating system&#x27;s file explorer can view the screenshots already, and you don&#x27;t need to set up a crawl. Screenshots also compress well, as webp or png after crunching it.
评论 #25842715 未加载
评论 #25840756 未加载
评论 #25848403 未加载
评论 #25842496 未加载
评论 #25841007 未加载
remirkover 4 years ago
This article is blogspam.<p>The repository has enough information on its own: <a href="https:&#x2F;&#x2F;github.com&#x2F;ArchiveBox&#x2F;ArchiveBox" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ArchiveBox&#x2F;ArchiveBox</a>
评论 #25839339 未加载
matt_fover 4 years ago
Interesting side note:<p>It seems like a lot of people in this thread have an interest in retaining a &quot;replayable timeline&quot; of their own browsing&#x2F;reading history.<p>There&#x27;s probably enough support here to gather a few contributors for an open source project.
评论 #25844985 未加载
评论 #25848572 未加载
nikisweetingover 4 years ago
Hey all, @pirate (ArchiveBox maintainer) here, thanks for posting this @adamhearn.<p>If you like ArchiveBox check out our new Twitter account for the project, <a href="https:&#x2F;&#x2F;twitter.com&#x2F;ArchiveBoxApp" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;ArchiveBoxApp</a> we just opened it and we&#x27;ll be posting announcements and prerelease sneak-peeks on there in the future.
blastroover 4 years ago
i use this every single day and think very highly of it. thanks for reminding me - i&#x27;m going to sponsor this developer on github...
评论 #25841378 未加载
unnouinceputover 4 years ago
Quote: &quot;..even if you instruct it to begin archiving a site then it can easily fail if that site’s robots.txt prevents crawling&quot;<p>Huh? Does actually the big corporations care anymore about robots.txt? Nowadays is more of a &quot;netiquette&quot; than anything else. Google definitely ignores it. Dunno DuckDucGo what it does
zeckalphaover 4 years ago
How long until this is a feature baked into a mainstream web browser? Archive, prefetch, cache, all variants on a theme. History, bookmarks, local search engine, all the same.
评论 #25842724 未加载
评论 #25844089 未加载
jedimastertover 4 years ago
Is there a list of web page archive formats I could look at? There are a few things I&#x27;d love to do where it would be very handy to have one file per page
评论 #25848551 未加载
0x426577617265over 4 years ago
I use this with an automated script that watches my Twitter activity. If I like a tweet it determines if it contains a URL then archives it.
mikeceover 4 years ago
This would be a nice thing to be able to run on a Synology NAS or other kind of device that typically has terabytes of storage.
评论 #25840234 未加载
评论 #25842998 未加载
greypowerOzover 4 years ago
so.. you CAN have a box that is &quot;the internet&quot;....
评论 #25843800 未加载
mikiemover 4 years ago
How can I use this to archive sites&#x2F;pages that require logging in to see?
评论 #25848274 未加载
评论 #25843946 未加载
dirtyidover 4 years ago
Tried this a while ago, disappointed at HD usage.<p>My solution as heavy TTS user who has balabolka setup to read copied text which naturally leaves a log for future reference. There&#x27;s extentions to auto copy highlighted text and append urls which makes entire flow straight forward. Log each day is around 1-5mbs of text saved in a big folder. Biggest limitation is trying to advance search unstructured text files by complex keywords within dates. I&#x27;m sure I can setup each clip with delimiters so logs can be imported into a searchable DB, just too lazy.
评论 #25848318 未加载
evcover 4 years ago
You will need a lot of disk storage right?
评论 #25842013 未加载
评论 #25840080 未加载
评论 #25838312 未加载
评论 #25841485 未加载
egberts1over 4 years ago
A real OSINT archive box would also capture all non-inline JavaScript, CSS and blob: files.
ketamine__over 4 years ago
How does archive.is trick news sites into showing content without the paywall? Is it pure user agent spoofing?<p>I&#x27;m wondering if this could be applied here.
评论 #25848357 未加载
throwawayseaover 4 years ago
Can you configure this tool to login to websites (for paid news subscriptions) and get past those paywalls?
评论 #25848340 未加载
评论 #25843862 未加载
评论 #25843745 未加载