Great read!<p>In the past, I have successfully used HtmlUnit to fulfill my admittedly limited scraping needs.<p>It runs headless, but it has a virtual head designed to pretend it's a user visting a web application to be be tested for QA purposes. You just program it to go through the motions of a human visting a site to be tested (or scraped). E.g., click here, get some response. For each whatever in the response, click and aggregate the results in your output (to whatever granularity).<p>Alas, it's in Java. But, if you use JRuby, you can avoid most of the nastiness that implies. (You do need to <i>know</i> Java, but at least you don't have to <i>write</i> Java.)<p>Hartley, what is your recommended toolkit?<p>I note you mentioned the problem of dynamically generated content. You develop your plan of attack using the browser plus Chrome Inspector or Firebug. So far, so good. But what if you want to be headless? Then you need something that will generate a DOM as if presenting a real user interface but instead simply returns a reference to the DOM tree that you are free to scan and react to.