What is the benefit of using phantomJS in this case? I understand that it is very useful if content is dependant on JS running.<p>But that doesn't seem to be the case here. With Python I would have used a parser like lxml or BeautifulSoup (and I'm sure there is something comparable for JS) coupled with Requests async methods. That would probably not only end up with shorter and more concise code, but also be a lot faster.
This technique is certainly useful in a variety of instance and I've done the same thing with both HTMLUnit and JWebUnit in Java. The "great site you know of" appears to be filled with books that are copyrighted and for-profit so I'm not sure you'd really want to publicize what you're doing on your blog.
You could also use the CasperJS wrapper and have the script automatically download those files for you.<p>See <a href="http://casperjs.org/api.html#casper.download" rel="nofollow">http://casperjs.org/api.html#casper.download</a>
If you like PhantomJS, be sure to also check out CasperJS. I use it with jQuery, Underscore and Underscore.string.<p>I just wish that jQuery had support for XPath style selectors as well. Chainable XPath would be hella sweet.
phantomJS + CasperJS make crawling easily. I build <a href="http://sp.iderman.info" rel="nofollow">http://sp.iderman.info</a> to help scratching easier.