TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Web crawling and downloading ebooks with phantomJS

61 pointsby gillybalmost 13 years ago

7 comments

inDesperateZonealmost 13 years ago
What is the benefit of using phantomJS in this case? I understand that it is very useful if content is dependant on JS running.<p>But that doesn't seem to be the case here. With Python I would have used a parser like lxml or BeautifulSoup (and I'm sure there is something comparable for JS) coupled with Requests async methods. That would probably not only end up with shorter and more concise code, but also be a lot faster.
评论 #4123091 未加载
评论 #4123086 未加载
smoyeralmost 13 years ago
This technique is certainly useful in a variety of instance and I've done the same thing with both HTMLUnit and JWebUnit in Java. The "great site you know of" appears to be filled with books that are copyrighted and for-profit so I'm not sure you'd really want to publicize what you're doing on your blog.
评论 #4123154 未加载
评论 #4122760 未加载
veverkapalmost 13 years ago
You could also use the CasperJS wrapper and have the script automatically download those files for you.<p>See <a href="http://casperjs.org/api.html#casper.download" rel="nofollow">http://casperjs.org/api.html#casper.download</a>
评论 #4123147 未加载
malandrewalmost 13 years ago
If you like PhantomJS, be sure to also check out CasperJS. I use it with jQuery, Underscore and Underscore.string.<p>I just wish that jQuery had support for XPath style selectors as well. Chainable XPath would be hella sweet.
zdwalteralmost 13 years ago
phantomJS + CasperJS make crawling easily. I build <a href="http://sp.iderman.info" rel="nofollow">http://sp.iderman.info</a> to help scratching easier.
评论 #4123053 未加载
radagaisusalmost 13 years ago
Phantom is awesome. I tried to use it for testing, but it's too slow (10 seconds for one test). Anyone else tried it? Any tips?
评论 #4123217 未加载
评论 #4123152 未加载
评论 #4122735 未加载
er354yertyalmost 13 years ago
Wouldn't Node be helpful here?
评论 #4123188 未加载
评论 #4123159 未加载