I came from a third world country and internet was pretty expensive. For some reason, my provider made Facebook completely free. So in my free college days I used the Facebook developer echo API to make a HTTP proxy so I can browse internet for free. It was terrible, was only HTTP 1, so no web sockets, videos stopped randomly etc, but hey I could read Reddit.
You can't create your own link previewer, cloudflare will put a captcha in front of every website. All I want is a a freaking <title> tag. They don't seem eager to fix it either, their proposed solution is to contact every website owner (seriously) to ask them to whitelist you[1].<p>Frankly, i wish facebook or cloudflare offered their previewer as a free service, since most websites have them whitelisted.<p>1. <a href="https://community.cloudflare.com/t/attention-required-message-when-sharing-link/88999" rel="nofollow">https://community.cloudflare.com/t/attention-required-messag...</a>
I've used Yahoo's YQL. While I would hit rate limits and other crap when trying to scape data of some sites directly. YQL would provide me nicely structured data without these stupid limits as many don't see yahoo's bot as a scraper.
At the best case scenario, google has a monopoly on scraping. Imagine trying to create a global search engine, how can you possibly even crawl sites that are behind cloudflare or just allow google/fb/bing bots ?<p>Can you real-time crawl twitter ? Pretty sure they have a special deal with google to instant ping on new tweets.<p>How many websites actually ping google on new content ?
Would just like to give an honorable mention to Google Translate, the most accessible http proxy of all time. It’s especially good for bypassing corporate access controls. I’ve used it many times for accessing solution threads on technical subreddits at work.
That's pretty interesting, Facebook as a "web scale / hundreds of pages per second" batch web page summarizer. I imagine you could build a pretty decent general purpose search engine that way...free crawler.
Someone who would create a scraping API that site owners could embed in their projects and get paid for feeding crawlers with their data, could make billions.
datadome.co is blocked for me:<p>> <i>datadome.co is being blocked by AdGuard DNS filter, AdGuard Tracking Protection filter, EasyPrivacy, Goodbye Ads and oisd.</i><p>Dunno what they do, but it can't be good.