I don't quite understand why you would use a full-blown browser like phantomjs for crawling (I've seen a lot of projects recently taking this approach, so this critique is not directly towards Apifier).<p>Yes, I get that in some specific circumstances it would be nice to be able to execute the JavaScript on the page but think about the trade-off here.<p>In the vast majority of cases a simple HTTP GET request with a DOM parser is all you need -- actually not a single one of the examples on the Apifier homepage has any need for phantomjs.<p>Wouldn't it be much much cheaper, simpler and faster to ditch phantomjs? Or is there something I'm missing here?
Hello HN! Today we’re launching what we were building for the past couple of months. Apifier is a hosted web crawler for developers that enables them to extract data from any website using a few simple lines of JavaScript. We built it because we realized that many existing web scrapers trade off their ability to scrape complex websites for the "simplicity" of their user interface. We thought: we are programmers and we already use JavaScript for client-side development, so why not use it for scraping?<p>Please have a look at the service, play with the examples and maybe set up your own crawl. My co-founder jakubbalada and myself will be around here to answer your questions. We'd love to hear what you guys think!
Looks really cool! Pricing is the big stickler for me. I've been burned too many times to build any critical piece of my app with it without knowing how much it'll cost if it gets popular.
I'm hoping this could save me some work.<p>A few questions, if founders are still around:<p>-Can you cache pages / download entire sites?
-If caching, can you detect changes on a given schedule, trigger the extraction "pageFunction" and save versioned data?<p>-How do you handle errors?<p>-Will you handle database extractions and other sites that require multiple levels of what you have as pseudo-URLs?
As a website owner, is it easy to block a rude crawler by contacting you ? (How would I identify in the first place that the crawler is operated by you ? Would my server logfile have enough data to point back to you ? )<p>Nice & useful demo. I'll give it a try.
I love the demos, and that you can use them without registering. One thing I couldn't find without making an account was what happens after you've used a Gigabyte. That would be a helpful addition I think.
Also similar to <a href="https://morph.io/" rel="nofollow">https://morph.io/</a> which has more of a trend towards open data sets.