科技回声

Who's developing an approach to web scraping based on computer vision (CV)? I've looked for this but so far not found much beyond [0] -- although the motivations for this are also touched upon by [1].Scraping is an arms race, of course. A simple but often successful way to fight scraping is, for instance, changing the names of classes routinely. This affects scrapers more than users, because a user doesn't see those class names, while a scraper relies on them.Is anyone scraping using only (or mostly) computer vision on the rendered browser screen, and simulating mouse clicks and key presses?It seems like anti-scraping measures to defeat an CV-based approach would be more intrusive to the user and thus they would be used less often.[0] https://github.com/jimbobewenhall/OpenCV-website-scraper [1] https://incolumitas.com/2021/05/20/avoid-puppeteer-and-playwright-for-scraping/

2 条评论

obl1que将近 3 年前

I think this may be what I was looking for:<a href="https://www.askui.com/askui-vs-selenium/" rel="nofollow">https://www.askui.com/askui-vs-selenium/</a>Looks like it has been posted on HN several times, each with very little discussion.

pacarvalho将近 3 年前

Maybe not even computer vision on the pixel level but instead ML on the DOM to notice when it has loaded enough to parse the content from it?

Ask HN: Web scraping based on Computer Vision?

2 条评论

Ask HN: Web scraping based on Computer Vision?

2 条评论