Generate RSS feed for any website using CSS selectors

203 点作者 thirdplace_将近 2 年前

22 条评论

toastal将近 2 年前

CSS selectors were more useful before the Tailwind fad of dropping meaningful classes names in favor of recreating inline styles but with abbreviations to memorize. I use μBlock Origin + userStyles a lot which both also uses CSS selectors & the last couple of years everything has become a lot harder on the end user to tweak/fix. If you’re lucky now, you’ll have some ARIA attributes to select on.

评论 #36734201 未加载

snthd将近 2 年前

RSSHub[0] is in the same ballpark, but consists of a large library of site-specific code[1][2].[0]<a href="https://github.com/DIYgod/RSSHub/">https://github.com/DIYgod/RSSHub/</a>[1]<a href="https://github.com/DIYgod/RSSHub/tree/master/lib/routes">https://github.com/DIYgod/RSSHub/tree/master/lib/routes</a>[2]<a href="https://github.com/DIYgod/RSSHub/tree/master/lib/v2">https://github.com/DIYgod/RSSHub/tree/master/lib/v2</a>

评论 #36736489 未加载

solardev将近 2 年前

It ded.Archive: <a href="https://web.archive.org/web/20230714202418/https://rss-bridge.org/bridge01/" rel="nofollow noreferrer">https://web.archive.org/web/20230714202418/https://rss-bridg...</a>Sample feed: <a href="https://web.archive.org/web/20230308160413/https://rss-bridge.org/bridge01/?action=display&bridge=ABCNewsBridge&topic=act&format=Html" rel="nofollow noreferrer">https://web.archive.org/web/20230308160413/https://rss-bridg...</a>

评论 #36730234 未加载

awesomegoat_com将近 2 年前

I was always afraid to use on of these. I thought that the css selectors would be too brittle and ultimately break.I have build my own solution that is automagical at <a href="https://awesomegoat.com/" rel="nofollow noreferrer">https://awesomegoat.com/</a> but I am running into next set of issues which are various scraping protections. It seems that reasonable RSS gateway today needs to include botnet of residential proxies just to read content on the internet.

xnx将近 2 年前

This is a great tool! Before I learned about nitter, this was my primary way to follow people on Twitter. I love the idea of trying to wrestle unsupported feeds (Twitter, Instagram, etc.) into a standard/open format.

jasonlotito将近 2 年前

The lack of feed generation is why I so many of the latest blog platforms are non-starters in my book. It boggles my mind. Honestly, if you don't generate a feed of some sort, I really can't take you seriously.

nfriedly将近 2 年前

I run my own instance of RSS Bridge to keep track of authors that I like on Goodreads.It works pretty well, although every once in a while Goodreads hiccups, and then RSS bridge gives me a bunch of "new posts" that are actually error messages.

评论 #36733624 未加载

okuntilnow将近 2 年前

Huginn is an another useful tool that allows you to wrangle CSS selectors and XPath nodes to create RSS feeds.I use it quite successfully to get data out of undocumented APIs and out into RSS.<a href="https://github.com/huginn/huginn">https://github.com/huginn/huginn</a>

bubblematrix将近 2 年前

This honestly is standard web scraping but these projects always catch my attention.You're bound at the mercy of rate-limiting firewalls (so you'll have to rotate proxies if you intend on using this heavily) on top of the standard CloudFront bot detection recaptcha, and div-obfuscation (a good example of this is Facebook).

评论 #36733643 未加载

dagurp将近 2 年前

These days I just let chagpt generate a script that scrapes a site and spits out an rss file. Then I run it with cron.

评论 #36733339 未加载

评论 #36732819 未加载

ChrisArchitect将近 2 年前

Other services like this: <a href="https://www.fivefilters.org/feed-creator/" rel="nofollow noreferrer">https://www.fivefilters.org/feed-creator/</a>

评论 #36734873 未加载

eviks将近 2 年前

What's the easiset way to also run a few basic filters on the site/RSS feed's content to make it truly shine vs simplistic scraping, like- splitting the full feed by theme of the article into separate feeds and at the same time- remove a few keywords and also- get article length and split into a long / short feed- Or maybe get what you used to have on some news sites - subscribe only to a specific author instead of getting bombarded with hundreds of items in a feed

评论 #36738327 未加载

评论 #36734510 未加载

PaulHoule将近 2 年前

I've wondered why people have tried all sorts of cumbersome ways to splice metadata onto HTML like RDFa but never tried the obvious approach of basing extraction rules on CSS selectors... Often these work without the cooperation of the target site so long as they use CSS the way it was supposed be used (e.g. not tailwind, bootstrap, etc.)

评论 #36731855 未加载

评论 #36731695 未加载

评论 #36730463 未加载

评论 #36734801 未加载

CoBE10将近 2 年前

For me PolitePol is best because if doesn't limit the amount of feeds and the free plan is pretty good: <a href="https://politepol.com" rel="nofollow noreferrer">https://politepol.com</a>

treyd将近 2 年前

I wonder if this would work better / be more expressive with XPATH-style selectors?

评论 #36730645 未加载

account-5将近 2 年前

Is there a standalone application that can do similar. That doesn't require a web server to run. Like an RSS reader you'd run on you desktop or phone? I'd definitely be interested in that.

Hamuko将近 2 年前

FreshRSS has XPath scraping.<a href="https://danq.me/2022/09/27/freshrss-xpath/" rel="nofollow noreferrer">https://danq.me/2022/09/27/freshrss-xpath/</a>

midasz将近 2 年前

Does it work for websites that fetch content async? I've had success with <a href="https://morss.it" rel="nofollow noreferrer">https://morss.it</a> instead (which can also be selfhosted)

simonjgreen将近 2 年前

This is very similar to how you can scrape data from web with powerquery

skribanto将近 2 年前

Getting 502 Bad Gateway

评论 #36730038 未加载

kayson将近 2 年前

FreshRSS has this feature built in. But you can use rss-bridge for far more complicated scenarios too

1vuio0pswjnm7将近 2 年前

"Generate RSS feed for any website using CSS selectors"For me, "CSS selectors" always seems like a deceptive term, if it means selecting HTML tag elements. What if the website does not use styling.I read 1000s of websites, including all HN submissions, without using CSS. When I want to extract information from a website, I focus on patterns in the page. They might be HTML, they might be style elements, but they could be anything. I never assume that all websites will wrap the information I want in certain elements. There is a ridiculous amount of random variation amongst websites.

评论 #36732044 未加载

评论 #36732059 未加载

评论 #36740454 未加载