TechEcho

7 comments

What is the benefit of using phantomJS in this case? I understand that it is very useful if content is dependant on JS running.<p>But that doesn't seem to be the case here. With Python I would have used a parser like lxml or BeautifulSoup (and I'm sure there is something comparable for JS) coupled with Requests async methods. That would probably not only end up with shorter and more concise code, but also be a lot faster.

评论 #4123091 未加载

评论 #4123086 未加载

smoyeralmost 13 years ago

This technique is certainly useful in a variety of instance and I've done the same thing with both HTMLUnit and JWebUnit in Java. The "great site you know of" appears to be filled with books that are copyrighted and for-profit so I'm not sure you'd really want to publicize what you're doing on your blog.

评论 #4123154 未加载

评论 #4122760 未加载

veverkapalmost 13 years ago

You could also use the CasperJS wrapper and have the script automatically download those files for you.<p>See <a href="http://casperjs.org/api.html#casper.download" rel="nofollow">http://casperjs.org/api.html#casper.download</a>

评论 #4123147 未加载

malandrewalmost 13 years ago

If you like PhantomJS, be sure to also check out CasperJS. I use it with jQuery, Underscore and Underscore.string.<p>I just wish that jQuery had support for XPath style selectors as well. Chainable XPath would be hella sweet.

zdwalteralmost 13 years ago

phantomJS + CasperJS make crawling easily. I build <a href="http://sp.iderman.info" rel="nofollow">http://sp.iderman.info</a> to help scratching easier.

评论 #4123053 未加载

radagaisusalmost 13 years ago

Phantom is awesome. I tried to use it for testing, but it's too slow (10 seconds for one test). Anyone else tried it? Any tips?

评论 #4123217 未加载

评论 #4123152 未加载

评论 #4122735 未加载

er354yertyalmost 13 years ago

Wouldn't Node be helpful here?

评论 #4123188 未加载

评论 #4123159 未加载

7 comments

inDesperateZonealmost 13 years ago

评论 #4123091 未加载

评论 #4123086 未加载

smoyeralmost 13 years ago

评论 #4123154 未加载

评论 #4122760 未加载

veverkapalmost 13 years ago

评论 #4123147 未加载

malandrewalmost 13 years ago

zdwalteralmost 13 years ago

phantomJS + CasperJS make crawling easily. I build <a href="http://sp.iderman.info" rel="nofollow">http://sp.iderman.info</a> to help scratching easier.

评论 #4123053 未加载

radagaisusalmost 13 years ago

Phantom is awesome. I tried to use it for testing, but it's too slow (10 seconds for one test). Anyone else tried it? Any tips?

Web crawling and downloading ebooks with phantomJS

7 comments

Web crawling and downloading ebooks with phantomJS

7 comments