I tracked everything I read on the internet for a year

250 pointsby akpa1over 2 years ago

32 comments

leokennisover 2 years ago

My reading list strategy:- Send to Feedbin (<a href="https://feedbin.com/blog/2019/08/20/save-webpages-to-read-later/" rel="nofollow">https://feedbin.com/blog/2019/08/20/save-webpages-to-read-la...</a>)- Never look at it again

评论 #32763181 未加载

评论 #32765833 未加载

评论 #32764196 未加载

评论 #32764676 未加载

评论 #32767866 未加载

akshaykumar90over 2 years ago

Great writeup. I too have a long reading list - currently at 133.I use my own little side project (Savory) to track the list. When I come across a page that I don't have the time or energy to finish right now, I save it and add the "reading" tag to it. When I have free time, I can open the reading tag and pick up something new to read.The best part is that I usually add a couple more tags (e.g. "security" or "economics" etc.) when I save a link. This way, the reading list allows me to filter by topic. It has been an unexpected hack to attack the growing list, since I am usually able to finish multiple articles in a single run, all in the same topic, because there is usually a link between them even when I might have saved them days or weeks apart.Anyway I like how OP actually has a reading history. I really need to add something similar in Savory. Right now, when I finish reading something, I just remove the "reading" tag and I don't get a neat history.

评论 #32762894 未加载

评论 #32761259 未加载

评论 #32761260 未加载

Xeoncrossover 2 years ago

I'm still waiting for a web extension that sends a copy of the webpage I'm looking at (for more than 1 minute) to an endpoint I specify along with some metadata like the URL and user-agent. Obviously, block certain domains like banks or email.I'd like something to build up a searchable index of everything I've read recently so I can easily find content again yet this is NOT something I want a 3rd party to do. I want to self-host something like a tiny Go or Rust server that only uses 10mb of ram to index all the pages into an embed rocks/level/badger/etc. database.

评论 #32767192 未加载

评论 #32767108 未加载

评论 #32768107 未加载

caprockover 2 years ago

This is a neat writeup. It's fun to think about how to potentially automate this kind of tracking.> I wish there was an easy way to filter by "independent websites"This side comment from the post is intriguing. Other than manual curation, I wonder if there is a way to identify commercial vs independent domains? This would make a really good entry point for a specialty search engine for indie sites only.

评论 #32760394 未加载

评论 #32762189 未加载

评论 #32760073 未加载

评论 #32760797 未加载

aa-jvover 2 years ago

I Print-to-PDF everything I've ever found interesting to spend longer than 2 minutes lingering on .. and as a result, I've got 20+ years of Internet articles to go through and read offline, any time.Its very interesting to see the change of quality of technical writing over the last two decades. There's a definite, observable increase in click-bait style writing.

评论 #32762742 未加载

评论 #32764460 未加载

评论 #32762660 未加载

loliveover 2 years ago

I have started using Obsidian (at work). And I copy/paste into it any web content, text or image I find useful from the intranet, emails, meet. I try my best to organise things. But for the most part, I use the search engine and the [autogenerated] links.The only requirement when adding content is to figure out whether it should be added to an existing note or a new dedicated note should be created. [btw, note nesting does exist in Obsidian]With this simple workflow, you completely eliminate the notion of provenance of the knowledge. The knowledge is here and up to your organisational habits.After some time doing that, you end up with VERY dense notes (in term of knowledge/line ratio), and very few useless (distracting) content.For the moment I like that A LOT !

评论 #32764498 未加载

Udoover 2 years ago

Tampermonkey is great for this, because it can log everything and it brings its own XMLHttpRequest:<pre><code> GM_xmlhttpRequest({ method : 'GET', url : 'https://myloggingurl/?client='+client+'&url='+encodeURIComponent(window.location.href)+ '&title='+encodeURIComponent(document.title), responseType : 'json', onerror : function(e) { console.error('URL-Logger', e); }, }); </code></pre> I've been logging all my web activities since 2018, it's been a great tool. On the server side, I filter out ad spam and other extraneous URLs, and then run a cronjob that converts all new HTML documents it sees to PDFs with wkhtmltopdf. It's been a great tool for finding stuff in those moments where I go "hm, I remember seeing something about this months ago..."

评论 #32763960 未加载

cgb_over 2 years ago

The author doesn't appear to have documented the bookmarklet itself. If they are here or another person, can you suggest what it might look like to have a bookmarklet collect the url, page title, meta description and image, and then set window.location.href ?

评论 #32762968 未加载

评论 #32761077 未加载

nathan_f77over 2 years ago

I used to use a browser extension that would track every page I visited and index everything. This was a few years ago, and I can't remember what it was called or why I stopped using it. I think I was paying something like $5/mo. I'd like to find something like that again, it was really useful. I think it would be even more powerful with an AI agent that could organize all the information into categories, and answer questions like "What was the article I was reading last week about <x>?"Is anyone building something like this? (It would be great if I could run something on my own server.)

评论 #32762838 未加载

评论 #32764912 未加载

rcarrover 2 years ago

I came across this video by Curtis McHale that completely changed the way I keep track of everything:<a href="https://youtu.be/xlDfpcipCm4" rel="nofollow">https://youtu.be/xlDfpcipCm4</a>I used to try bookmarking things using the built in browser bookmark manager and then later using Raindrop and even copying links into Obsidian but this wasn’t really all that effective. After watching the video I trialled DevonThink and was massively impressed. Now, every article I read that I find interesting I save as either a pdf or web archive so I can search and find it later. I also do the same for useful stack overflow posts so I know I’ll be able to find them if necessary. On top of this I bookmark all kinds of useful sites and categorise them in folders in their respective databases.This allows me to keep Obsidian for just pure notes/writing. If I want to link between the two I can also use Hook to embed links between the two applications.If I want to get proper reference formatting for something, I can open it from DevonThink in the browser and then save it to Zotero. Alternatively some people save everything to zotero instead of DevonThink and then index the folder using DevonThink so it is included in their search. Either approach works.Highly recommend anyone with a Mac trying out the free trial of DevonThink, I think it’s like 100 hours of usage. Would dislike going back to living without it.

评论 #32768159 未加载

ch33zerover 2 years ago

I think an interesting angle would be a categorization by the author of what they found was useful/fluff/low quality. Would be a good way to figure out where you're wasting time vs getting value (of course sometime the point is to waste time...)

Animatsover 2 years ago

But in a tag sense, not a content sense.Automatic resolution followup would be interesting. If you read an article about a new research result, you should get an followup on how it came out, months or years later. If you read about an arrest, you should eventually get the case disposition. If you read about a new product, a followup when there are substantial reviews after customers have experience with it.

smoovbover 2 years ago

Could we feed the authors reading list into an AI and guess his OS, his Amazon history, or his likely topics of conversations at a dinner party? Really curious if you could mirror his decision making and tastes by what was in his reading list for some period of time.

7373737373over 2 years ago

I've always found it surprising that something like this isn't more prioritized by browsersWhy only let the NSA and advertisers retain and analyze your full browsing history?

评论 #32764797 未加载

评论 #32762956 未加载

alliaoover 2 years ago

i've always wondered if you can just export firefox's history db...i'd then take a periodic dump of it and add it into another dedicated db for searching. god knows I spend way too much time looking up for things I had once read....

评论 #32761258 未加载

评论 #32761002 未加载

1vuio0pswjnm7over 2 years ago

I automate a log of all the HTTP requests the computer makes, which naturally includes all the websites I visit.^1 I am not always using a browser to make HTTP requests, and for recreational web use I use a text-only one exclusively, so a "browser history" is not adequate.In the loopback-bound forward proxy that handles all HTTP traffic from all applications, I add a line to include the request URL in an HTTP response header, called "url:" in this example. As such, it will appear in the log. For example, something like<pre><code> http-response add-header url "https://%[capture.req.hdr(1)]%[capture.req.uri]" </code></pre> This allows me to write simple scripts to copy URLs into a simple HTML page. I then read the simple HTML with a text-only browser (links).For example, something like<pre><code> cat > 1.sh #!/bn/sh WIDTH=120; echo "<ol><pre>" x=$(echo x|tr x '\034'); tr -d '\034' \ |sed -e "s/.*/& &/;s/ .\{${WIDTH}\}/&$x/;s/$x.*//" \ |sed -e "/./{s/.* /<li><a href=&>/;s|$|</a></li>|;}" \ -e '#certain urls to exclude' \ -e '/cdx?url=/d' \ -e '/dns-query?dns=/d' \ -e '/ds5q9oxwqwsfj.cloudfront.net/d' \ -e '/index.commoncrawl.org/d' ^D grep url: 1.log|cut -d' ' -f3-|1.sh > 1.htm links 1.htm </code></pre> What about POST data. That is captured into another HTTP response header, called "post-data:" in this example<pre><code> http-request add-header post-data %[req.body] if { method POST } </code></pre> To look at the POST data I might do something like<pre><code> grep post-data: 1.log|cut -d' ' -f3-|less </code></pre> 1. I also use a system for searching the www or specific www sites from the command line. The search results URLs for each query are stored in simple HTML format similar to the above. One query per file. What's non-obvious is that each file can contain search results from different sources, sort of like the "meta-search engine" idea but more flexible. The simple HTML format contains the information necessary to continue searches, at any time, thus allowing a more diverse and greater number of search results to be retrieved. (Sadly, www search engines have been effectively limiting the number of search result URLs we can retrieve with Javascript and cookies disabled.) The command line program reads the information to continue a search from the simple HTML comments.

noyesnoover 2 years ago

I use a service that will pull selected articles from my Pocket account and formats and prints them to a nice booklet that is sent to me one a month. I find this makes me more conscious when deciding whether to add an article to Pocket as I am now asking myself if I really want to read it later in the printed booklet (vs. just adding it to "the list" to remove it from the tab bar).

评论 #32762381 未加载

评论 #32762273 未加载

raldiover 2 years ago

How much of a piece do you need to read before you add it to the list? Does it count if you give up halfway through? Or ten seconds in?

评论 #32762951 未加载

realkiddreddover 2 years ago

I just save everything in Pocket with tags

评论 #32760625 未加载

nivethanover 2 years ago

I'm coming up on 3k articles read, probably most of it's all from hn. Jesus I have too much! I use my own app, leftwrite.io to keep track of everything I read and the notes I make. A retrospective might be fun though it'll make it very clear how much of nothing I do.

srinathkrishnaover 2 years ago

Thanks for sharing the `window.location.href`. I had attempted to do something like this in the past and gave up when I hit the HTTP issue you are referring to, specifically in the websites (mostly news sites) that I predominantly spent time with.

评论 #32762993 未加载

frumiousircover 2 years ago

Maybe add an RSS version of <a href="https://pages.tdpain.net/readingList/" rel="nofollow">https://pages.tdpain.net/readingList/</a>Who knows, perhaps there is a nascent meta-RSS movement developing.

评论 #32764155 未加载

oxffover 2 years ago

Zotero simplified a lot of my note taking / "information retaining" from stuff I bookmark / read. Much better than Obsidian etc.> Add thing to Zotero, 99% of the time the metadata comes with it and is searchable already.> To mark it as read, make a note for it (usually consists of main idea, good / bad ideas in the article, few sentences).> Zotero entries without note are assumed unread.for my diary / misc notes I use vscode with some markdown plugins, foam has like daily note functionality which is nice, add a new diary entry and add some tags ezpz

评论 #32764974 未加载

eimrineover 2 years ago

I have not any tracking and my history/cookies is cleaned regularly. But my reading history is might be the boringest thing for analysis - everything about Lisp and some darknet activities. Former I anyway keep learning and collecting, and latter I really prefer to be forgotten.

janandonlyover 2 years ago

I’ve made a Shortcut a while back that basically saves a webpage to pdf and adds tags. When I’ve read an article it get a new tag “already read” and that’s it.Now the tags are an iOS/iPadOS/macOS thing sure, but the pdfs I can take with me to any platform.

surefover 2 years ago

I have different reading lists on hacker news, twitter, reddit and medium and because of this I never read anything that I don’t read directly… If you need to share between them you need some convenient app for your phone and computer.

评论 #32763380 未加载

评论 #32761926 未加载

ketzoover 2 years ago

Does anyone have a similar bookmarklet for adding things to Notion on mobile (iOS, Chrome app)?You can “share to” the Notion app, but you have to type a bunch of info in manually. Would love to make it a one-tap.

r1jshethover 2 years ago

Amazing, I am going to fork this project. Thanks for writing

tankadoover 2 years ago

I tried something similar but I failed after a couple of months

ge96over 2 years ago

TangentI was briefly trying to summarize pages. APIs out there, eg. Summarizer API. But it depends on your summary/take away if you read it.I save all my tabs before I purge all the windows.

odysseusover 2 years ago

What I like are the buttons at the bottom of the page. Takes me back ..

keepquestioningover 2 years ago

Really? Absolutely nothing risque.. dumb.. or even slightly offensive. This list is obviously censored.

评论 #32760900 未加载