TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Using Google Refine to Clean a Data Set

69 点作者 craig552uk将近 13 年前

6 条评论

richardv将近 13 年前
I found this really helpful. Not so much to do with the actual article, but to do with actual making me aware of Google Refine.<p>Installation was a breeze. I couldn't find any instructions, but it was as simple as downloading for Linux, extracting, the running the shell script.<p><a href="http://code.google.com/p/google-refine/downloads/detail?name=google-refine-2.5-r2407.tar.gz&#38;can=1&#38;q=" rel="nofollow">http://code.google.com/p/google-refine/downloads/detail?name...</a><p>The application automatically opens in a new Chrome window.<p>From here, I grabbed a data dump from one of our external providers.<p>We work with a lot of providers who are <i>really</i> technologically challenged. I'd love to be able to say, here you are.. here is our API, start pushing your content to us. But in practice they don't even know what their XML feeds do. We need their data, but getting a consistent dataset from them when they seem to change their format regularly is a pain! And when importing only 10 or so items at a time it's excruciatingly painful.<p>Today I learnt how easy that can be with Google Refine!
评论 #4218032 未加载
评论 #4217614 未加载
danso将近 13 年前
As a data analyst-type-person, I can't recommend enough the use of Google Refine. When someone told me about it, I thought "that's dumb, I would just write a cleaning/regex script and connect to my DB"...but tried it out anyway, because my colleague is a much better power programmer than I am.<p>That's how good Refine is...it adds an extra, GUI-driven step to the workflow, but it's so well executed that it makes data exploration (and cleaning) effortless.<p>I wrote a tutorial awhile back about how I used it in an investigative reporting project: <a href="http://www.propublica.org/nerds/item/using-google-refine-for-data-cleaning" rel="nofollow">http://www.propublica.org/nerds/item/using-google-refine-for...</a>
frankc将近 13 年前
Is this worth looking into for someone who already knows perl, R and the unix zoo? Or is it more targetted at people who don't deal with data on a regular basis?
评论 #4218486 未加载
评论 #4219189 未加载
guard-of-terra将近 13 年前
I wonder why they won't let you to open local files without passing their content via browser. Should be very useful when run locally.
评论 #4217764 未加载
评论 #4217760 未加载
评论 #4220555 未加载
dpcx将近 13 年前
This seems, on the surface at least, very similar to what ScraperWiki is trying to do, by converting messy publicly available data in to a more structured format.<p>Am I correct in that understanding, or did I miss the boat?
评论 #4217909 未加载
chucknelson将近 13 年前
Not very impressive for people who work with data sets often and probably have tools like SAS or Excel, but good to know it exists as a free alternative.
评论 #4218124 未加载
评论 #4218042 未加载
评论 #4218350 未加载