TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Apifier – hosted web crawler for developers

101 pointsby jancurnover 9 years ago

13 comments

thecodemonkeyover 9 years ago
I don&#x27;t quite understand why you would use a full-blown browser like phantomjs for crawling (I&#x27;ve seen a lot of projects recently taking this approach, so this critique is not directly towards Apifier).<p>Yes, I get that in some specific circumstances it would be nice to be able to execute the JavaScript on the page but think about the trade-off here.<p>In the vast majority of cases a simple HTTP GET request with a DOM parser is all you need -- actually not a single one of the examples on the Apifier homepage has any need for phantomjs.<p>Wouldn&#x27;t it be much much cheaper, simpler and faster to ditch phantomjs? Or is there something I&#x27;m missing here?
评论 #10421767 未加载
评论 #10421782 未加载
评论 #10423060 未加载
评论 #10423404 未加载
jancurnover 9 years ago
Hello HN! Today we’re launching what we were building for the past couple of months. Apifier is a hosted web crawler for developers that enables them to extract data from any website using a few simple lines of JavaScript. We built it because we realized that many existing web scrapers trade off their ability to scrape complex websites for the &quot;simplicity&quot; of their user interface. We thought: we are programmers and we already use JavaScript for client-side development, so why not use it for scraping?<p>Please have a look at the service, play with the examples and maybe set up your own crawl. My co-founder jakubbalada and myself will be around here to answer your questions. We&#x27;d love to hear what you guys think!
评论 #10420602 未加载
rgbrgbover 9 years ago
Looks really cool! Pricing is the big stickler for me. I&#x27;ve been burned too many times to build any critical piece of my app with it without knowing how much it&#x27;ll cost if it gets popular.
评论 #10421360 未加载
necrodomeover 9 years ago
How do you access the latest crawling results programmatically? I hope you are not expecting me to click results link for a developer&#x27;s tool.
评论 #10422616 未加载
danielharanover 9 years ago
I&#x27;m hoping this could save me some work.<p>A few questions, if founders are still around:<p>-Can you cache pages &#x2F; download entire sites? -If caching, can you detect changes on a given schedule, trigger the extraction &quot;pageFunction&quot; and save versioned data?<p>-How do you handle errors?<p>-Will you handle database extractions and other sites that require multiple levels of what you have as pseudo-URLs?
评论 #10423559 未加载
benjmnover 9 years ago
As a website owner, is it easy to block a rude crawler by contacting you ? (How would I identify in the first place that the crawler is operated by you ? Would my server logfile have enough data to point back to you ? )<p>Nice &amp; useful demo. I&#x27;ll give it a try.
评论 #10421889 未加载
Eridrusover 9 years ago
Why should I use this instead of just firing up some spot instances with phantomjs?
评论 #10421695 未加载
bentpinsover 9 years ago
I love the demos, and that you can use them without registering. One thing I couldn&#x27;t find without making an account was what happens after you&#x27;ve used a Gigabyte. That would be a helpful addition I think.
评论 #10420825 未加载
aakilfernandesover 9 years ago
Cool! How do you stop users from trying to run malicious code?
评论 #10420956 未加载
thomasfromcdnjsover 9 years ago
Also similar to <a href="https:&#x2F;&#x2F;morph.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;morph.io&#x2F;</a> which has more of a trend towards open data sets.
评论 #10422699 未加载
misiti3780over 9 years ago
I like the idea - would probably use it in the future - can you talk a little bit about what technologies you are using?
评论 #10420745 未加载
asterfieldover 9 years ago
I was just thinking yesterday of creating a similar service. I&#x27;m glad to see someone else has already made it :D
评论 #10421876 未加载
Raphmediaover 9 years ago
Exactly what I was looking for in order to efficiently improve my searches for a new home. Thanks!
评论 #10421297 未加载
评论 #10421037 未加载