TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: What are the most challenging pages to scrape?

2 pointsby gbajsonabout 7 years ago

2 comments

randomerrabout 7 years ago
Try PWA based one. Since they load in segments and cache a lot you&#x27;ll have fun:<p><a href="https:&#x2F;&#x2F;pwa.rocks&#x2F;" rel="nofollow">https:&#x2F;&#x2F;pwa.rocks&#x2F;</a> - Look at the business and news webpages<p>Also look for the website that use AMP. They&#x27;re even more fragmented then PWA pages. Below is article about AirBNB using AMP and iFrames:<p><a href="https:&#x2F;&#x2F;medium.com&#x2F;swlh&#x2F;how-airbnb-is-putting-amp-at-the-core-of-its-digital-strategy-d6b9cf1fc0ad" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;swlh&#x2F;how-airbnb-is-putting-amp-at-the-cor...</a><p><a href="https:&#x2F;&#x2F;www.ampproject.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.ampproject.org&#x2F;</a>
gbajsonabout 7 years ago
I am looking for pages which are difficult to scrape, to see the applied techniques, and to learn how to bypass them.