Hi,<p>In one of my projects, I happen to need to get some scrapers running for tens of websites to collect rows, columns of tables (<table>, <ul>, <div>). Those tables are well formatted. I have written several scrapers in python, which basically use CSS selector and then do some simple transformation with regular expression. I just wonder whether there is any scraper generator which may take a url and sample target output as input, and produce a scraper automatically?<p>Any suggestion is welcomed. Thanks in advance.
Have you looked at phantomjs?<p>The webintro example here (<a href="https://github.com/ariya/phantomjs/wiki/Examples" rel="nofollow">https://github.com/ariya/phantomjs/wiki/Examples</a>) scrapes a specific element.
I would take a look at the Mac App FakeApp. It does a lot of what you are saying expecially in regards to CSS and xpath selectors. I have been using it and have been able to do some really great stuff.
If you don't want to build it yourself, check out import.io. They turn any website into an API. They did a demo at SV Newtech a couple months ago.