TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: What info would you web scrape in 2020?

2 pointsby tiburonover 5 years ago
I&#x27;ve been web scraping for a while and I am running out of ideas. I use an anonymous crawling service which provides HTML content at scale, it also has a set of predefined scrapers which I use instead of maintaining own scrapers, that speeds up any scraping I do.<p>I need new ideas to build which are unique and can be useful to people. I build services based on scraping which can be used in different fields, like marketing, SEO, drop shipping etc. In many cases I need JS crawling capabilities and with the service I use, I can get those widgets and rendered pages handy so I could focus on the data and the idea.<p>knowing that you have the resources I have, what would you be looking to scrape today? I like services that help people so any feedback would be great. I am trying to think out of the box and find new ideas and would appreciate some inspiration hereby.<p>I&#x27;m also open to hear what data you would scrape from the web in realtime, if you have the right tools to scale your scraping.

2 comments

PaulHouleover 5 years ago
For 2020 I&#x27;d like to get the web browser out of my life as much as possible, that is, motivated by this work I&#x27;ve done<p><a href="https:&#x2F;&#x2F;ontology2.com&#x2F;essays&#x2F;HackerNewsForHackers&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ontology2.com&#x2F;essays&#x2F;HackerNewsForHackers&#x2F;</a> <a href="https:&#x2F;&#x2F;ontology2.com&#x2F;essays&#x2F;ClassifyingHackerNewsArticles&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ontology2.com&#x2F;essays&#x2F;ClassifyingHackerNewsArticles&#x2F;</a><p>I&#x27;d like to crawl a large number of sites that have quality articles, for instance<p><a href="https:&#x2F;&#x2F;voxeu.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;voxeu.org&#x2F;</a> <a href="https:&#x2F;&#x2F;www.anandtech.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.anandtech.com&#x2F;</a><p>and put them through a workflow where I never see an article more than once, things get classified, etc.<p>One major issue I have is ads. In 2020 it is not just a matter of ads getting in the way of content, but rather ads getting in the way of ads. That voxeu site doesn&#x27;t have ads, but it does abuse Javascript in such a way that the back button really works wrong.<p>The web is breaking down to the extent that I&#x27;d really like to filter the junk out and have an order-of-magnitude better interface.
评论 #22059845 未加载
kristianpover 5 years ago
I&#x27;m toying with scraping a certain type of product listing and running classifiers to help users find the product they want.<p>It sounds like you&#x27;re building a scraping service and looking for clients.
评论 #22059907 未加载