Why a service and not a library?<p>It looks like a great way for you to discover URLs but like a terribly slow way for people to avoid implementing robots.txt rules.
While this looks good, I don't think it's feasible for a web crawler in most cases. Crawlers want to crawl a ton of URLs and it would have to make a request to your service for each and every URL.<p>What's the plan here? Check for a sitemap.xml (which generally only contains crawlable URLs anyway) or crawl the index and look for all links and send a request to your service for every URL before crawling it?<p>I personally think it would be better suited as a library where you can pass it a robots.txt and it'll let you know if you can crawl a URL based on that.
The service is very nice and I understand your reason for developing it. I see this service to be having more value in helping companies find all the web pages, rather than just the allowed ones.<p>I understand the unethical nature of the above method, however, I see it happening quite a lot in practice.
I created this project to use in my projects. It is open source. You can use it if you are implementing SEO tools or a web crawler. Note that this is a first alpha release.<p>Give me some feedback!