Say you have an listings/classifieds app with public entries served via an API. How do you prevent bots from scraping and stealing your data by directly hitting your API?<p>If you build your own backend, you could put a gateway like Kong or similar in front —to detect and throttle/ban robotic usage patterns.<p>But how do you achieve this if you use Firebase, Graphcool, or another Backend As A Service (BaaS)?<p>You could deploy a proxy/gateway, but that would incur in an extra hop (= latency) for every single call.<p>EDIT: Actually, this question is applicable to any API, not just public ones. For private APIs restricted by login, the bot would simply have to create a user first.
I don't know that there's a generally applicable answer if the api calls are direct from end user -> public api, and on infrastructure you don't control.<p>The answers would be highly dependent on the specific service, and whatever capabilities they offer. Firebase, for example, has a concept of custom tokens where you could implement rules on a per-api-consumer basis.<p>There does seem to be an opportunity for CDN companies to offer an API gateway with throttle, scripting, oath, conditional caching, bot blocking, etc. I don't know why they haven't offered yet. A CDN hosted Tyk or Kong instance would likely be popular.
You could try an API key policy. Any calls to the API without a key would be throttled down to normal usage levels.<p>People can apply for an API key and you can monitor for any abuse.