TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Web scraping based on Computer Vision?

2 点作者 obl1que将近 3 年前
Who&#x27;s developing an approach to web scraping based on computer vision (CV)? I&#x27;ve looked for this but so far not found much beyond [0] -- although the motivations for this are also touched upon by [1].<p>Scraping is an arms race, of course. A simple but often successful way to fight scraping is, for instance, changing the names of classes routinely. This affects scrapers more than users, because a user doesn&#x27;t see those class names, while a scraper relies on them.<p>Is anyone scraping using only (or mostly) computer vision on the rendered browser screen, and simulating mouse clicks and key presses?<p>It seems like anti-scraping measures to defeat an CV-based approach would be more intrusive to the user and thus they would be used less often.<p>[0] https:&#x2F;&#x2F;github.com&#x2F;jimbobewenhall&#x2F;OpenCV-website-scraper [1] https:&#x2F;&#x2F;incolumitas.com&#x2F;2021&#x2F;05&#x2F;20&#x2F;avoid-puppeteer-and-playwright-for-scraping&#x2F;

2 条评论

obl1que将近 3 年前
I think this may be what I was looking for:<p><a href="https:&#x2F;&#x2F;www.askui.com&#x2F;askui-vs-selenium&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.askui.com&#x2F;askui-vs-selenium&#x2F;</a><p>Looks like it has been posted on HN several times, each with very little discussion.
pacarvalho将近 3 年前
Maybe not even computer vision on the pixel level but instead ML on the DOM to notice when it has loaded enough to parse the content from it?