TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: What do you scrape the web for?

6 点作者 cirowrc超过 7 年前
Hey folks,<p>It&#x27;s not rare for me to see myself scraping the web for both side projects and personal needs - checking whether the prices of the groceries are good or not based on the local supermarket&#x27;s website; gathering some emails to cold-email about ideas ...<p>It seems to me that just by having the ability to quickly write some code to gather information from a government web page or anything that shows stuff on the web is a <i></i>huge<i></i> power.<p>What about you? Any great web scraping stories? Have you ever got in trouble by doing so? Made an entire company of it?

3 条评论

byoung2超过 7 年前
I just wrote a crawler for a client who needs to check vendor licenses (plumber, electrician, etc). Given a license number, state, and trade, it looks up the appropriate licensing agency in that state and pulls the license info (issue&#x2F;expiration dates, biographical information, infractions, etc). Luckily most of these are old school sites so they&#x27;re easy to crawl, but a few have captcha (easily solved with OCR) or are single page apps.
评论 #15527221 未加载
beld_pro超过 7 年前
I think some of the most real-world use cases are in the realestate&#x2F;hotels&#x2F;coworking-spaces sector. The second space, email gathering (guess what, all the time people have to use &lt;blabla&gt; at &lt;something&gt; dot com to try to not have their emails scraped).
评论 #15527230 未加载
PaulHoule超过 7 年前
I made the State of Delaware change its search form so that you can only look up a company if you know both the name and the number.