TechEcho

13 comments

I don't quite understand why you would use a full-blown browser like phantomjs for crawling (I've seen a lot of projects recently taking this approach, so this critique is not directly towards Apifier).Yes, I get that in some specific circumstances it would be nice to be able to execute the JavaScript on the page but think about the trade-off here.In the vast majority of cases a simple HTTP GET request with a DOM parser is all you need -- actually not a single one of the examples on the Apifier homepage has any need for phantomjs.Wouldn't it be much much cheaper, simpler and faster to ditch phantomjs? Or is there something I'm missing here?

评论 #10421767 未加载

评论 #10421782 未加载

评论 #10423060 未加载

评论 #10423404 未加载

jancurnover 9 years ago

Hello HN! Today we’re launching what we were building for the past couple of months. Apifier is a hosted web crawler for developers that enables them to extract data from any website using a few simple lines of JavaScript. We built it because we realized that many existing web scrapers trade off their ability to scrape complex websites for the "simplicity" of their user interface. We thought: we are programmers and we already use JavaScript for client-side development, so why not use it for scraping?Please have a look at the service, play with the examples and maybe set up your own crawl. My co-founder jakubbalada and myself will be around here to answer your questions. We'd love to hear what you guys think!

评论 #10420602 未加载

rgbrgbover 9 years ago

Looks really cool! Pricing is the big stickler for me. I've been burned too many times to build any critical piece of my app with it without knowing how much it'll cost if it gets popular.

评论 #10421360 未加载

necrodomeover 9 years ago

How do you access the latest crawling results programmatically? I hope you are not expecting me to click results link for a developer's tool.

评论 #10422616 未加载

danielharanover 9 years ago

I'm hoping this could save me some work.A few questions, if founders are still around:-Can you cache pages / download entire sites? -If caching, can you detect changes on a given schedule, trigger the extraction "pageFunction" and save versioned data?-How do you handle errors?-Will you handle database extractions and other sites that require multiple levels of what you have as pseudo-URLs?

评论 #10423559 未加载

benjmnover 9 years ago

As a website owner, is it easy to block a rude crawler by contacting you ? (How would I identify in the first place that the crawler is operated by you ? Would my server logfile have enough data to point back to you ? )Nice & useful demo. I'll give it a try.

评论 #10421889 未加载

Eridrusover 9 years ago

Why should I use this instead of just firing up some spot instances with phantomjs?

评论 #10421695 未加载

bentpinsover 9 years ago

I love the demos, and that you can use them without registering. One thing I couldn't find without making an account was what happens after you've used a Gigabyte. That would be a helpful addition I think.

评论 #10420825 未加载

aakilfernandesover 9 years ago

Cool! How do you stop users from trying to run malicious code?

评论 #10420956 未加载

thomasfromcdnjsover 9 years ago

Also similar to <a href="https://morph.io/" rel="nofollow">https://morph.io/</a> which has more of a trend towards open data sets.

评论 #10422699 未加载

misiti3780over 9 years ago

I like the idea - would probably use it in the future - can you talk a little bit about what technologies you are using?

评论 #10420745 未加载

asterfieldover 9 years ago

I was just thinking yesterday of creating a similar service. I'm glad to see someone else has already made it :D

评论 #10421876 未加载

Raphmediaover 9 years ago

Exactly what I was looking for in order to efficiently improve my searches for a new home. Thanks!

评论 #10421297 未加载

评论 #10421037 未加载

13 comments

thecodemonkeyover 9 years ago

评论 #10421767 未加载

评论 #10421782 未加载

评论 #10423060 未加载

评论 #10423404 未加载

jancurnover 9 years ago

评论 #10420602 未加载

rgbrgbover 9 years ago

Looks really cool! Pricing is the big stickler for me. I've been burned too many times to build any critical piece of my app with it without knowing how much it'll cost if it gets popular.

评论 #10421360 未加载

necrodomeover 9 years ago

How do you access the latest crawling results programmatically? I hope you are not expecting me to click results link for a developer's tool.

评论 #10422616 未加载

danielharanover 9 years ago

评论 #10423559 未加载

benjmnover 9 years ago

评论 #10421889 未加载

Eridrusover 9 years ago

Why should I use this instead of just firing up some spot instances with phantomjs?

评论 #10421695 未加载

bentpinsover 9 years ago

评论 #10420825 未加载

aakilfernandesover 9 years ago

Cool! How do you stop users from trying to run malicious code?

评论 #10420956 未加载

thomasfromcdnjsover 9 years ago

Also similar to <a href="https://morph.io/" rel="nofollow">https://morph.io/</a> which has more of a trend towards open data sets.

评论 #10422699 未加载

misiti3780over 9 years ago

I like the idea - would probably use it in the future - can you talk a little bit about what technologies you are using?

评论 #10420745 未加载

asterfieldover 9 years ago

I was just thinking yesterday of creating a similar service. I'm glad to see someone else has already made it :D

评论 #10421876 未加载

Raphmediaover 9 years ago

Exactly what I was looking for in order to efficiently improve my searches for a new home. Thanks!

评论 #10421297 未加载

评论 #10421037 未加载

Show HN: Apifier – hosted web crawler for developers

13 comments

Show HN: Apifier – hosted web crawler for developers

13 comments