Ask HN: Best languages or frameworks for high concurrency web scraping?

2 pointsby CoreSetalmost 10 years ago

Hi HN. I'm doing research that encourages me to take as tight a temporal snapshot of various websites as I possibly can (i.e.grabbing content from them all simultaneously) I've been playing around with phantomjs and various python solutions but neither is very performant.<p>Any suggestions on where to start looking for a more rigorous answer?

1 comment

philbrittonalmost 10 years ago

If you don't need to execute js, then maybe try a simple http get to retrieve the contents, then process it separately. If you're looking to parse and extract while on page I'd recommend Beautiful Soup. If interested in trying a node alternative check out Cheerio.