TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Why do people use Puppeteer for webscraping instead of SaaS services?

4 点作者 proszkinasenne2超过 4 年前
Hi HN<p>As in the title, I am wondering what are the reasons anyone would use Puppeteer&#x2F;Selenium&#x2F;other-browser-emu for web scraping if there are already tens if not hundreds of SaaS services offering &quot;scraping-as-a-service&quot;. Except for JS execution.<p>A handful of examples: Scrapehero, Webrobots, Apify, Scrapingbee, Scrapinghub, Promptcloud<p>Except for the ones that require setup fee, or have ridiculous pricing models. Why would anyone want to setup Puppeteer&#x2F;Selenium&#x2F;other scraping bots instead of using one of the &quot;scraping-as-a-service&quot; platforms?

2 条评论

phendrenad2超过 4 年前
Probably because people who are doing web scraping aren&#x27;t professional scrapers, they&#x27;re just programmers who need some data quickly. And since they&#x27;re already familiar with Selenium, they think that&#x27;s the state of the art. I&#x27;ve never seen an ad for a scraping service, so I also didn&#x27;t know that they existed.
评论 #25838767 未加载
billconan超过 4 年前
my main concern is pricing. many websites use anti-scraping technologies. scraping the html doesn&#x27;t work anymore. need to load everything and execute js. for example, I have seen some can detect headless &#x2F; puppeteer mode too. I ended up creating my own scraping infra using vanilla chrome...<p>current saas platforms charge by request count. If I need to load everything, the cost will be too high.
评论 #25838828 未加载