TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: What info would you web scrape in 2020?

2 点作者 tiburon超过 5 年前
I&#x27;ve been web scraping for a while and I am running out of ideas. I use an anonymous crawling service which provides HTML content at scale, it also has a set of predefined scrapers which I use instead of maintaining own scrapers, that speeds up any scraping I do.<p>I need new ideas to build which are unique and can be useful to people. I build services based on scraping which can be used in different fields, like marketing, SEO, drop shipping etc. In many cases I need JS crawling capabilities and with the service I use, I can get those widgets and rendered pages handy so I could focus on the data and the idea.<p>knowing that you have the resources I have, what would you be looking to scrape today? I like services that help people so any feedback would be great. I am trying to think out of the box and find new ideas and would appreciate some inspiration hereby.<p>I&#x27;m also open to hear what data you would scrape from the web in realtime, if you have the right tools to scale your scraping.

2 条评论

PaulHoule超过 5 年前
For 2020 I&#x27;d like to get the web browser out of my life as much as possible, that is, motivated by this work I&#x27;ve done<p><a href="https:&#x2F;&#x2F;ontology2.com&#x2F;essays&#x2F;HackerNewsForHackers&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ontology2.com&#x2F;essays&#x2F;HackerNewsForHackers&#x2F;</a> <a href="https:&#x2F;&#x2F;ontology2.com&#x2F;essays&#x2F;ClassifyingHackerNewsArticles&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ontology2.com&#x2F;essays&#x2F;ClassifyingHackerNewsArticles&#x2F;</a><p>I&#x27;d like to crawl a large number of sites that have quality articles, for instance<p><a href="https:&#x2F;&#x2F;voxeu.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;voxeu.org&#x2F;</a> <a href="https:&#x2F;&#x2F;www.anandtech.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.anandtech.com&#x2F;</a><p>and put them through a workflow where I never see an article more than once, things get classified, etc.<p>One major issue I have is ads. In 2020 it is not just a matter of ads getting in the way of content, but rather ads getting in the way of ads. That voxeu site doesn&#x27;t have ads, but it does abuse Javascript in such a way that the back button really works wrong.<p>The web is breaking down to the extent that I&#x27;d really like to filter the junk out and have an order-of-magnitude better interface.
评论 #22059845 未加载
kristianp超过 5 年前
I&#x27;m toying with scraping a certain type of product listing and running classifiers to help users find the product they want.<p>It sounds like you&#x27;re building a scraping service and looking for clients.
评论 #22059907 未加载