TechEcho

13 comments

Ombudsmanover 4 years ago

I came from a third world country and internet was pretty expensive. For some reason, my provider made Facebook completely free. So in my free college days I used the Facebook developer echo API to make a HTTP proxy so I can browse internet for free. It was terrible, was only HTTP 1, so no web sockets, videos stopped randomly etc, but hey I could read Reddit.

评论 #25003996 未加载

评论 #25003717 未加载

评论 #25005926 未加载

评论 #25005837 未加载

评论 #25005040 未加载

评论 #25005932 未加载

评论 #25003587 未加载

cblconfederateover 4 years ago

You can't create your own link previewer, cloudflare will put a captcha in front of every website. All I want is a a freaking <title> tag. They don't seem eager to fix it either, their proposed solution is to contact every website owner (seriously) to ask them to whitelist you[1].Frankly, i wish facebook or cloudflare offered their previewer as a free service, since most websites have them whitelisted.1. <a href="https://community.cloudflare.com/t/attention-required-message-when-sharing-link/88999" rel="nofollow">https://community.cloudflare.com/t/attention-required-messag...</a>

评论 #25002314 未加载

评论 #25001318 未加载

评论 #25001326 未加载

评论 #25001745 未加载

评论 #25004918 未加载

评论 #25001231 未加载

评论 #25002831 未加载

评论 #25001320 未加载

jedbergover 4 years ago

So this was a very long web page to say: Facebook forgot to rate limit their web scraper on a per user basis, but we told them and they fixed it.

评论 #25002328 未加载

评论 #25003815 未加载

评论 #25002273 未加载

sschuellerover 4 years ago

I've used Yahoo's YQL. While I would hit rate limits and other crap when trying to scape data of some sites directly. YQL would provide me nicely structured data without these stupid limits as many don't see yahoo's bot as a scraper.

ddorian43over 4 years ago

At the best case scenario, google has a monopoly on scraping. Imagine trying to create a global search engine, how can you possibly even crawl sites that are behind cloudflare or just allow google/fb/bing bots ?Can you real-time crawl twitter ? Pretty sure they have a special deal with google to instant ping on new tweets.How many websites actually ping google on new content ?

评论 #25002179 未加载

评论 #25002310 未加载

评论 #25003461 未加载

AmericanChopperover 4 years ago

Would just like to give an honorable mention to Google Translate, the most accessible http proxy of all time. It’s especially good for bypassing corporate access controls. I’ve used it many times for accessing solution threads on technical subreddits at work.

tyingqover 4 years ago

That's pretty interesting, Facebook as a "web scale / hundreds of pages per second" batch web page summarizer. I imagine you could build a pretty decent general purpose search engine that way...free crawler.

评论 #25001902 未加载

jtsiskinover 4 years ago

Why can’t you just make your web crawler look like fb or googlebot (via user agent)? Do website owners actually check the ip?

评论 #25001358 未加载

intricatedetailover 4 years ago

Someone who would create a scraping API that site owners could embed in their projects and get paid for feeding crawlers with their data, could make billions.

AznHisokaover 4 years ago

How is DataDome different from Cloudflare? The latter offers bot protection for free if you are already a Cloudflare customer

评论 #25000687 未加载

评论 #25000691 未加载

notRobotover 4 years ago

datadome.co is blocked for me:> datadome.co is being blocked by AdGuard DNS filter, AdGuard Tracking Protection filter, EasyPrivacy, Goodbye Ads and oisd.Dunno what they do, but it can't be good.

martimarkovover 4 years ago

The website doesn’t open for me.

评论 #25001060 未加载

MDinBkover 4 years ago

interesting!

13 comments

Ombudsmanover 4 years ago

评论 #25003996 未加载

评论 #25003717 未加载

评论 #25005926 未加载

评论 #25005837 未加载

评论 #25005040 未加载

评论 #25005932 未加载

评论 #25003587 未加载

cblconfederateover 4 years ago

评论 #25002314 未加载

评论 #25001318 未加载

评论 #25001326 未加载

评论 #25001745 未加载

评论 #25004918 未加载

评论 #25001231 未加载

评论 #25002831 未加载

评论 #25001320 未加载

jedbergover 4 years ago

So this was a very long web page to say: Facebook forgot to rate limit their web scraper on a per user basis, but we told them and they fixed it.

评论 #25002328 未加载

评论 #25003815 未加载

评论 #25002273 未加载

sschuellerover 4 years ago

ddorian43over 4 years ago

评论 #25002179 未加载

评论 #25002310 未加载

评论 #25003461 未加载

AmericanChopperover 4 years ago

tyingqover 4 years ago

评论 #25001902 未加载

jtsiskinover 4 years ago

Why can’t you just make your web crawler look like fb or googlebot (via user agent)? Do website owners actually check the ip?

评论 #25001358 未加载

intricatedetailover 4 years ago

Someone who would create a scraping API that site owners could embed in their projects and get paid for feeding crawlers with their data, could make billions.

AznHisokaover 4 years ago

How is DataDome different from Cloudflare? The latter offers bot protection for free if you are already a Cloudflare customer

评论 #25000687 未加载

评论 #25000691 未加载

notRobotover 4 years ago

martimarkovover 4 years ago

The website doesn’t open for me.

评论 #25001060 未加载

MDinBkover 4 years ago

interesting!

Facebook was used as a proxy by web scraping bots

13 comments

Facebook was used as a proxy by web scraping bots

13 comments