I have been working on a tool to try to 'train' a crawler to extract specific elements of a page. I could do with some advice on where to take it from here. here's how it currently works:<p>1) It has a queue of domains that I have pre-processed. For the initial purposes I've restricted it to pages that I think are ecommerce based on $ signs, add to cart/basket type links etc<p>2) There is a visual tool that I then use to select certain parts of the page - eg price, product, image etc. I save these out as xpaths<p>3) Once I have done one URL I send a crawler to that domain and extract other pages that fit the profile of an ecommerce page and try to use the same mapping as number 2 above to extract the data<p>I have done a small video to show it in action:<p>http://www.screencast.com/t/riB3iiVMiSk<p>I'm not sure if I'm doing this the right way. If a site/page changes structure then I may have to re-map the data. I was hoping that someone would have some pointers for me in terms of any other ways to do this. Also with Javascript-heavy sites I've had some problems<p>If anyone has any knowledge of screen scraping, where it can be done more automatically, I'd really appreciate a steer!<p>Thanks<p>Ade