This looks like something I could use. Maybe not revolutional, but I do that from time to time, and even if only for organizational purposes it seems to make sense to store that stuff as a bucnh of configuration files for some external tool, rather than a bunch of python-scripts that I implement somewhat differently every time.<p>Right now I'm just wrapping my head around how this works, and didn't try it hands-on yet, but I struggle to evaluate from the existing documentation, how useful this actually is. All examples in the repository right now are ultimately one-page scrappers, which, honestly, would be quite useless to me. Pretty much every scraper I write has at least 2-3 logical layers. Like, consider your HN-example, but you want to include top-10 comments for each post. Is it even possible? Well, I guess for HN you could just get by using allowedURLs and treating default function as a parser for the comment-page, but this isn't generic enough. Consider some internet shop. That would be (1) product category tree, sometimes much easier to hard-code, rather than scrape it every time; hard-coding often is generative (e.g. example.com/X/A-B-C, where X is a string from the list, A, B and C are padded numbers, each with a different range) (2) you go into each category, retrieve either a sub-category list (possibly, js-rendered, multiple pages) or product list (same applies) (3) open each product url, do the actual parsing (name, price, specification, etc). Each of json-object from (3) often has to include some minimal parsed data from level (2) (like category name)<p>More advanced, but also way to popular to imagine a generic web-scraper without it: in addition to some json-metadata you download pictures, or pdf-files, etc. (Sometimes you don't even need metadata.) Maybe just text files, but the result is several GBs, and isn't suitable to be handled as a single json-object, but rather a file/directory tree.<p>Is any of this possible with this tool?<p>Also, regardless of being it useful for my cases, some minor comments:<p>1. Links in docs/readme.md#configuration don't work (but the .md files for them actually exist).<p>2. I would suggest making "url" in the configuration either a list, or string|list. I suppose, that pretty much doesn't change the logic, but would make a lot of basic use-cases much easier to implement.