TechEcho

17 comments

lazyjeffover 4 years ago

I feel like a simple automatic capture of timestamp + url + screenshot would already be very useful. This gives you a visual memory of the things you've seen on the web. I've wanted to develop this for a while, as a browser plugin.Being able to skim the past month or two click around the thumbnails would already be amazing. I've wanted to do that many times before to check if my memory was correct, or if a page changed since I last saw it, or figure out when I last saw something online.You don't need a special viewer for it, as your operating system's file explorer can view the screenshots already, and you don't need to set up a crawl. Screenshots also compress well, as webp or png after crunching it.

评论 #25842715 未加载

评论 #25840756 未加载

评论 #25848403 未加载

评论 #25842496 未加载

评论 #25841007 未加载

remirkover 4 years ago

This article is blogspam.The repository has enough information on its own: <a href="https://github.com/ArchiveBox/ArchiveBox" rel="nofollow">https://github.com/ArchiveBox/ArchiveBox</a>

评论 #25839339 未加载

matt_fover 4 years ago

Interesting side note:It seems like a lot of people in this thread have an interest in retaining a "replayable timeline" of their own browsing/reading history.There's probably enough support here to gather a few contributors for an open source project.

评论 #25844985 未加载

评论 #25848572 未加载

nikisweetingover 4 years ago

Hey all, @pirate (ArchiveBox maintainer) here, thanks for posting this @adamhearn.If you like ArchiveBox check out our new Twitter account for the project, <a href="https://twitter.com/ArchiveBoxApp" rel="nofollow">https://twitter.com/ArchiveBoxApp</a> we just opened it and we'll be posting announcements and prerelease sneak-peeks on there in the future.

blastroover 4 years ago

i use this every single day and think very highly of it. thanks for reminding me - i'm going to sponsor this developer on github...

评论 #25841378 未加载

unnouinceputover 4 years ago

Quote: "..even if you instruct it to begin archiving a site then it can easily fail if that site’s robots.txt prevents crawling"Huh? Does actually the big corporations care anymore about robots.txt? Nowadays is more of a "netiquette" than anything else. Google definitely ignores it. Dunno DuckDucGo what it does

zeckalphaover 4 years ago

How long until this is a feature baked into a mainstream web browser? Archive, prefetch, cache, all variants on a theme. History, bookmarks, local search engine, all the same.

评论 #25842724 未加载

评论 #25844089 未加载

jedimastertover 4 years ago

Is there a list of web page archive formats I could look at? There are a few things I'd love to do where it would be very handy to have one file per page

评论 #25848551 未加载

0x426577617265over 4 years ago

I use this with an automated script that watches my Twitter activity. If I like a tweet it determines if it contains a URL then archives it.

mikeceover 4 years ago

This would be a nice thing to be able to run on a Synology NAS or other kind of device that typically has terabytes of storage.

评论 #25840234 未加载

评论 #25842998 未加载

greypowerOzover 4 years ago

so.. you CAN have a box that is "the internet"....

评论 #25843800 未加载

mikiemover 4 years ago

How can I use this to archive sites/pages that require logging in to see?

评论 #25848274 未加载

评论 #25843946 未加载

dirtyidover 4 years ago

Tried this a while ago, disappointed at HD usage.My solution as heavy TTS user who has balabolka setup to read copied text which naturally leaves a log for future reference. There's extentions to auto copy highlighted text and append urls which makes entire flow straight forward. Log each day is around 1-5mbs of text saved in a big folder. Biggest limitation is trying to advance search unstructured text files by complex keywords within dates. I'm sure I can setup each clip with delimiters so logs can be imported into a searchable DB, just too lazy.

评论 #25848318 未加载

evcover 4 years ago

You will need a lot of disk storage right?

评论 #25842013 未加载

评论 #25840080 未加载

评论 #25838312 未加载

评论 #25841485 未加载

egberts1over 4 years ago

A real OSINT archive box would also capture all non-inline JavaScript, CSS and blob: files.

ketamine__over 4 years ago

How does archive.is trick news sites into showing content without the paywall? Is it pure user agent spoofing?I'm wondering if this could be applied here.

评论 #25848357 未加载

throwawayseaover 4 years ago

Can you configure this tool to login to websites (for paid news subscriptions) and get past those paywalls?

评论 #25848340 未加载

评论 #25843862 未加载

评论 #25843745 未加载

17 comments

lazyjeffover 4 years ago

评论 #25842715 未加载

评论 #25840756 未加载

评论 #25848403 未加载

评论 #25842496 未加载

评论 #25841007 未加载

remirkover 4 years ago

This article is blogspam.The repository has enough information on its own: <a href="https://github.com/ArchiveBox/ArchiveBox" rel="nofollow">https://github.com/ArchiveBox/ArchiveBox</a>

评论 #25839339 未加载

matt_fover 4 years ago

评论 #25844985 未加载

评论 #25848572 未加载

nikisweetingover 4 years ago

blastroover 4 years ago

i use this every single day and think very highly of it. thanks for reminding me - i'm going to sponsor this developer on github...

评论 #25841378 未加载

unnouinceputover 4 years ago

zeckalphaover 4 years ago

How long until this is a feature baked into a mainstream web browser? Archive, prefetch, cache, all variants on a theme. History, bookmarks, local search engine, all the same.

评论 #25842724 未加载

评论 #25844089 未加载

jedimastertover 4 years ago

Is there a list of web page archive formats I could look at? There are a few things I'd love to do where it would be very handy to have one file per page

评论 #25848551 未加载

0x426577617265over 4 years ago

I use this with an automated script that watches my Twitter activity. If I like a tweet it determines if it contains a URL then archives it.

mikeceover 4 years ago

This would be a nice thing to be able to run on a Synology NAS or other kind of device that typically has terabytes of storage.

评论 #25840234 未加载

评论 #25842998 未加载

greypowerOzover 4 years ago

so.. you CAN have a box that is "the internet"....

评论 #25843800 未加载

mikiemover 4 years ago

How can I use this to archive sites/pages that require logging in to see?

评论 #25848274 未加载

评论 #25843946 未加载

dirtyidover 4 years ago

评论 #25848318 未加载

evcover 4 years ago

You will need a lot of disk storage right?

评论 #25842013 未加载

评论 #25840080 未加载

评论 #25838312 未加载

评论 #25841485 未加载

egberts1over 4 years ago

A real OSINT archive box would also capture all non-inline JavaScript, CSS and blob: files.

ketamine__over 4 years ago

How does archive.is trick news sites into showing content without the paywall? Is it pure user agent spoofing?I'm wondering if this could be applied here.

评论 #25848357 未加载

throwawayseaover 4 years ago

Can you configure this tool to login to websites (for paid news subscriptions) and get past those paywalls?

评论 #25848340 未加载

评论 #25843862 未加载

评论 #25843745 未加载

Make Your Own Internet Archive with Archive Box

17 comments

Make Your Own Internet Archive with Archive Box

17 comments