This makes me just a bit nervous. You're scraping bank websites using a headless WebKit browser, which is presumably vulnerable to future exploits. You have my username and password (and probably verification questions) either stored on or accessible from that same server. Who's to say that one of the sites you crawl won't get compromised and used as a vector to compromise your crawler box and--potentially--your customers' banking credentials?
This article should have mentioned node.io (<a href="https://github.com/chriso/node.io" rel="nofollow">https://github.com/chriso/node.io</a>) for completeness. It hasn't been updated in a while and I'm not sure if other frameworks have popped up, but I've had a pleasure using it for some big scraping tasks.
I wonder how they get around the two level authentication problem? Even if I give my password to the scraper an extra credential would be required. How do you workaround that?
>There’s no way to download resources with phantomjs – the only thing you can do is create a snapshot of the page as a png or pdf. That’s useful but meant we had to resort back to request() for the PDF download.<p>That's not a "problem", you shouldn't be using Webkit to download files.