TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Import.io – Structured Web Data Scraping

100 pointsby steeplesabout 11 years ago

15 comments

mlandauerabout 11 years ago
If you&#x27;re concerned about using a hosted scraping platform because it might disappear check out <a href="https://morph.io" rel="nofollow">https:&#x2F;&#x2F;morph.io</a> - it&#x27;s open source as well <a href="https://github.com/openaustralia/morph" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;openaustralia&#x2F;morph</a>
uuid_to_stringabout 11 years ago
As a PoC, I would be willing to &quot;turn the web into data&quot;, i.e., produce one of the formats offered by these &quot;services&quot;: CSV.<p>I will use only standard UNIX utilities, no Python, etc. As such, you &quot;own&quot; the code. No SaaS. The result will be portable and run on any UNIX.<p>I believe I can deliver in fewer words of code and that the result will be easier to modify when sites change.<p>You pay nothing. Post your scraping &quot;challenges&quot; to HN.<p>I enjoy turning web into data.<p>Some people enjoy working with HTML, CSS, Javascript, etc. I prefer working with raw data.<p>It is interesting to hear that some people are willing to pay to have the HTML, CSS, Javascript, etc. stripped out.
评论 #7584297 未加载
ycmikeabout 11 years ago
HN,<p>So who do you guys use more? Import.io or Kimono? I have heard good things about both.
评论 #7583161 未加载
评论 #7586551 未加载
评论 #7583210 未加载
评论 #7583126 未加载
RaphiePSabout 11 years ago
There are a bunch of comments about rolling your own scraper instead of relying upon a possibly unreliable SaaS app.<p>That makes me think -- would it be viable to run a service that, instead of running the scraping on their own servers, simply gave you a custom binary to run?<p>Assuming that you trusted the executable, you would never have to worry about the company failing. It&#x27;d just be a one-time fee, and yours to use in perpetuity. Presumably updates would be free.
评论 #7583496 未加载
评论 #7583647 未加载
robotfelixabout 11 years ago
Great to see these guys are now out of Beta!<p>While their real-time Extractors aren&#x27;t quite as quick as doing it yourself, we&#x27;ve found them to be particularly useful for sites requiring JavaScript and&#x2F;or cookies to use.<p>It&#x27;s also worth mentioning that it&#x27;s quick to get started. You can start playing around with real data without having to dig into a site&#x27;s URL structure, and then write your own scraper later if needed.
chrisherringabout 11 years ago
Isn&#x27;t it illegal to scrape without permission? How would import.io handle the case when a large site comes back with legal threats when a user of their site has used scraped the wrong site? Can they claim non-responsibility?<p>Also what happens when sites start blocking their IPs due to repeated scraping or is this unlikely to happen?
seivanabout 11 years ago
Heads up, the application is placed in ~&#x2F;Desktop and not &#x2F;Applications
th0br0about 11 years ago
They presented last year at Yahoo!&#x27;s Hack Europe: London hackathon. It&#x27;s an interesting concept, they&#x27;ve come far since their initial presentation and while the app has its quirks I have come to use it occasionally for some tasks.<p>I hope that they&#x27;ll manage to properly monetize on this - I don&#x27;t see why I should pay for using a scraping rule if I can just write the scraper myself which doesn&#x27;t cost me that much more time.
fiberteraabout 11 years ago
What kind of legitimate uses are there for something like this? This is not a sarcastic question. It seems like an obvious spam magnet, but if people are using it legitimately wouldn&#x27;t their sources already be providing an API or RSS key?
评论 #7583510 未加载
评论 #7584362 未加载
评论 #7584029 未加载
评论 #7583477 未加载
thomabout 11 years ago
I suspect the real, top-secret business behind import.io is in either training a system to crawl the web and see structured data, and&#x2F;or gathering over time a very rich crowd-sourced database of structured data.
jmethvinabout 11 years ago
We&#x27;ve posted answers to some of your questions on our blog: <a href="http://blog.import.io/post/you-ask-we-answer" rel="nofollow">http:&#x2F;&#x2F;blog.import.io&#x2F;post&#x2F;you-ask-we-answer</a>
pmtarantinoabout 11 years ago
Can someone tell me more about the law and scrapping websites?
评论 #7583267 未加载
late2partabout 11 years ago
Unfortunately, this doesn&#x27;t seem to work too well on my mac. And, why do you want to know who my friends on Facebook are?
notduncansmithabout 11 years ago
Reminds me of <a href="https://www.kimonolabs.com/" rel="nofollow">https:&#x2F;&#x2F;www.kimonolabs.com&#x2F;</a>
notastartupabout 11 years ago
I wrote <a href="http://scrape.ly" rel="nofollow">http:&#x2F;&#x2F;scrape.ly</a> if you wanna have a look, it&#x27;s a url-based API for web scraping.