Deep dive into finding RSS feeds

144 pointsby domysee6 months ago

22 comments

Back when I was young, websites had this icon you could click that would take you straight to their RSS feed. You young whipper snappers have gone an fucked that up. Actually, I think it was Google's fault. When they killed their RSS reader people pronounced RSS dead so people just stopped publishing RSS feeds or just didn't link to them.* Yes, I know the article talks about the RSS icon, i'm just soapboxing.

评论 #42344448 未加载

评论 #42346546 未加载

评论 #42350311 未加载

aucisson_masque6 months ago

I thank WordPress for most of my RSS feed.I follow mostly RSS on non technology website, for instance road cycling. people that wouldn't care or know about RSS because they are not very techy, yet because they are normies that use WordPress for all their website it puts a page with RSS feed automatically. You got to find it with developer tool by searching RSS but 99% of the time if it's WordPress it got RSS.Thank you WordPress you bloated piece of shit :)

superkuh6 months ago

I generally try: /rss, /feed, /index.xml, /rss.xml, /feed.xml, etc. And at various root or /directory/* locations. <a href="https://blog.jim-nielsen.com/2021/feed-urls/" rel="nofollow">https://blog.jim-nielsen.com/2021/feed-urls/</a> is a good article with statistics on naming.I've been adding to my feeds.opml since reddit started dying in ~2015 and now I'm up to around ~1700 feeds and mostly independent from aggregators; though I still collect new feeds from HN/IRC/etc. Mostly I just always make a point to look for them whenever I read something cool on the web.

评论 #42345640 未加载

LorenDB6 months ago

My modus operandi for finding a non-obvious RSS feed is to check the Wayback Machine's list of saved URLs and search for "RSS", "feed", or "XML". That normally will find the feed as long as it exists.

sodality26 months ago

Tried out the feed finder on my blog again and I have another bug to report - it seems the URLs on the page can cause a crash within the web app! my blog (at matthew.science) uses Zola SSG, and it seems the URLs are formatted with a preceding //: '<a href="//matthew.science/posts/riscv/">Basics of the RISC-V ISA</a>'This causes the following error: TypeError: URL constructor: //matthew.science/posts/riscv/ is not a valid URL.

评论 #42345101 未加载

评论 #42348334 未加载

fallinditch6 months ago

This looks very useful. It would work well with Hoarder (would be cool if they were integrated ;)Note: Hoarder can automatically hoard RSS feeds as part of its 'bookmark everything' functionality. Hoarder uses AI to tag all the content (URLs, feeds, images, notes) so you can then do full text searches on your personal archive of your bookmarks etc.<a href="https://hoarder.app/" rel="nofollow">https://hoarder.app/</a>

评论 #42348140 未加载

openrisk6 months ago

Its interesting to contemplate an RSS-first browser that would have this functionality built-in. Think for example of promoting to full browser status a desktop RSS reader like Akregator [1] (which already embeds a webview).The browser as we now know it is mostly a static application that has long lost its user-centric mission. Websites might push some stuff but the user must do thinks manually. Its primary function is to provide a search window to external search. People even stopped using bookmarks and search for everything.This hypothetical RSS-Browser could become the main organizational tool for the users web experience, integrating the use of bookmarks.In fact even more "feeds" could be integrated like email and activitypub or atproto posts. It boils down to the fact that each person has a number of profiles/roles and within each they have a taxonomy of interests and we need a tool that integrates static and dynamic sources of information.[1] <a href="https://apps.kde.org/akregator/" rel="nofollow">https://apps.kde.org/akregator/</a>

camel-cdr6 months ago

This is useful, I set up RSS on my website yesterday.Turns out the feed finder couldn't find the feeds even though I've linked to them using clickable RSS icons.I didn't know about the autodiscovery feature so I'll add that now.

begriffs6 months ago

I created a lightweight shell script to check many url combinations on a site for feeds.<a href="https://github.com/begriffs/findrss">https://github.com/begriffs/findrss</a>The combinations came from what I observed in the big list of blogs I follow. The script works pretty well for most sites.

csswizardry6 months ago

I went canvassing for RSS feeds only yesterday! Some good stuff in here: <a href="https://bsky.app/profile/csswizardry.com/post/3lckq4qo6zs22" rel="nofollow">https://bsky.app/profile/csswizardry.com/post/3lckq4qo6zs22</a>

11235813216 months ago

It’d be neat for readers to seamlessly integrate with a scraper, either self-hosted or commercial, if no feed is found. I believe Inoreader allows scraping a few sites depending on the plan level; most reader services don’t.

artembugara6 months ago

I open-sourced pyGoogleNews and wrote a quick blog about how you can reverse engineer google news RSS to turn it into an RSS feed of any website that is supported by Google News<a href="https://news.ycombinator.com/item?id=42343182">https://news.ycombinator.com/item?id=42343182</a><a href="https://github.com/kotartemiy/pygooglenews">https://github.com/kotartemiy/pygooglenews</a>

panozzaj6 months ago

I use a Chrome extension (<a href="https://chromewebstore.google.com/detail/get-rss-feed-url/kfghpdldaipanmkhfpdcjglncmilendn?hl=en&pli=1" rel="nofollow">https://chromewebstore.google.com/detail/get-rss-feed-url/kf...</a>) and it seems to pick out the RSS URLs fairly consistently

renegat0x06 months ago

I fought this problem, since I wrote my own RSS reader in python. Might not be perfect.The problem with the approach presented here is speed. Most of the web pages, especially smaller are really slow.Crawling most of the web pages is pain, especially if you use selenium and small SBC.Therefore either the page presents a clean nice RSS link, or get lost.Most of the good, modern pages give you nice RSS. Even GitHub gives you RSS for commits.For other pages I try openRSS.For YouTube I use yt-dlp to obtain channel id, to establish RSS.Algorithm is crude, but gets the job done.<a href="https://github.com/rumca-js/Django-link-archive/blob/main/rsshistory/webtools/url.py">https://github.com/rumca-js/Django-link-archive/blob/main/rs...</a>

ks20486 months ago

It would be nice if someone ran this on commoncrawl and published a list of all the RSS feeds. (probably someone has?)Or I suppose you could just find all "Content-type: application/rss+xml" in CC.I know in the past, when I was looking for large lists of RSS feeds, I didn't really find what I was looking for.

评论 #42347518 未加载

PeterStuer6 months ago

You can add '/display-feed.rss' to the list of common suffixes for many .eu sites

评论 #42348324 未加载

ewired6 months ago

Pasting a URL in NewsBlur also uses several of these techniques to find the feed(s), and it is open source, so the feed-finding code could be ripped out of NewsBlur as an alternative to this.

kelvinjps106 months ago

I would like to be able put multiple websites, I had to build a script based on "Guessing the feed URL" approach to get the rss feeed of a bunch of websites that I had bookmarked

ulrischa6 months ago

Would be nice if it is implmented in freshrss

评论 #42343552 未加载

benrapscallion6 months ago

Does it correctly ignore the “Comments on:” feeds that are sometimes mistakenly chosen over the main feed?

评论 #42348359 未加载

zenlot6 months ago

Came here through RSS link from miniflux, running on nvidia jetson.

saaaaaam6 months ago

Ghost also publishes at /feed