TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Facebook was used as a proxy by web scraping bots

285 pointsby avastelover 4 years ago

13 comments

Ombudsmanover 4 years ago
I came from a third world country and internet was pretty expensive. For some reason, my provider made Facebook completely free. So in my free college days I used the Facebook developer echo API to make a HTTP proxy so I can browse internet for free. It was terrible, was only HTTP 1, so no web sockets, videos stopped randomly etc, but hey I could read Reddit.
评论 #25003996 未加载
评论 #25003717 未加载
评论 #25005926 未加载
评论 #25005837 未加载
评论 #25005040 未加载
评论 #25005932 未加载
评论 #25003587 未加载
cblconfederateover 4 years ago
You can&#x27;t create your own link previewer, cloudflare will put a captcha in front of every website. All I want is a a freaking &lt;title&gt; tag. They don&#x27;t seem eager to fix it either, their proposed solution is to contact every website owner (seriously) to ask them to whitelist you[1].<p>Frankly, i wish facebook or cloudflare offered their previewer as a free service, since most websites have them whitelisted.<p>1. <a href="https:&#x2F;&#x2F;community.cloudflare.com&#x2F;t&#x2F;attention-required-message-when-sharing-link&#x2F;88999" rel="nofollow">https:&#x2F;&#x2F;community.cloudflare.com&#x2F;t&#x2F;attention-required-messag...</a>
评论 #25002314 未加载
评论 #25001318 未加载
评论 #25001326 未加载
评论 #25001745 未加载
评论 #25004918 未加载
评论 #25001231 未加载
评论 #25002831 未加载
评论 #25001320 未加载
jedbergover 4 years ago
So this was a very long web page to say: Facebook forgot to rate limit their web scraper on a per user basis, but we told them and they fixed it.
评论 #25002328 未加载
评论 #25003815 未加载
评论 #25002273 未加载
sschuellerover 4 years ago
I&#x27;ve used Yahoo&#x27;s YQL. While I would hit rate limits and other crap when trying to scape data of some sites directly. YQL would provide me nicely structured data without these stupid limits as many don&#x27;t see yahoo&#x27;s bot as a scraper.
ddorian43over 4 years ago
At the best case scenario, google has a monopoly on scraping. Imagine trying to create a global search engine, how can you possibly even crawl sites that are behind cloudflare or just allow google&#x2F;fb&#x2F;bing bots ?<p>Can you real-time crawl twitter ? Pretty sure they have a special deal with google to instant ping on new tweets.<p>How many websites actually ping google on new content ?
评论 #25002179 未加载
评论 #25002310 未加载
评论 #25003461 未加载
AmericanChopperover 4 years ago
Would just like to give an honorable mention to Google Translate, the most accessible http proxy of all time. It’s especially good for bypassing corporate access controls. I’ve used it many times for accessing solution threads on technical subreddits at work.
tyingqover 4 years ago
That&#x27;s pretty interesting, Facebook as a &quot;web scale &#x2F; hundreds of pages per second&quot; batch web page summarizer. I imagine you could build a pretty decent general purpose search engine that way...free crawler.
评论 #25001902 未加载
jtsiskinover 4 years ago
Why can’t you just make your web crawler look like fb or googlebot (via user agent)? Do website owners actually check the ip?
评论 #25001358 未加载
intricatedetailover 4 years ago
Someone who would create a scraping API that site owners could embed in their projects and get paid for feeding crawlers with their data, could make billions.
AznHisokaover 4 years ago
How is DataDome different from Cloudflare? The latter offers bot protection for free if you are already a Cloudflare customer
评论 #25000687 未加载
评论 #25000691 未加载
notRobotover 4 years ago
datadome.co is blocked for me:<p>&gt; <i>datadome.co is being blocked by AdGuard DNS filter, AdGuard Tracking Protection filter, EasyPrivacy, Goodbye Ads and oisd.</i><p>Dunno what they do, but it can&#x27;t be good.
martimarkovover 4 years ago
The website doesn’t open for me.
评论 #25001060 未加载
MDinBkover 4 years ago
interesting!