TechEcho

10 comments

frankcover 12 years ago

I use unix in the same way and for the same purpose as the described in the blog, but I have come to the opinion that once you get into the describe and visualize phase, it's much easier to just drop into R. Reading in the kind of file being worked on here is often as simple asfoo <-read.csv("foo.csv")Getting summary descriptive statistics, item counts, scatter plots and histograms is often as easy assummary(foo)table(foo$col)plot(foo$xcol, foo$ycol)hist(foo$col).I think that is lot simpler than a 4 or 5 command pipeline that can be mistake-prone to edit when you want to change column names or things like that. I still do these kinds of things in the shell sometimes, and I don't know if I can put my finger on when exactly I would drop into R vs write out a pipeline, but there IS a line somewhere...

评论 #4867758 未加载

评论 #4867821 未加载

lutuspover 12 years ago

A quote: "... As if this wasn't enough, he [i.e.Tukey] also invented what is probably the most influential algorithm of all time." (emphasis added)No, Tukey did not "invent" the FFT. He rediscovered it, as did a number of others over the years since -- who else? -- Gauss first created it.<a href="http://en.wikipedia.org/wiki/Fast_Fourier_transform" rel="nofollow">http://en.wikipedia.org/wiki/Fast_Fourier_transform</a>A quote: "This method (and the general idea of an FFT) was popularized by a publication of J. W. Cooley and J. W. Tukey in 1965,[2] but it was later discovered (Heideman & Burrus, 1984) that those two authors had independently re-invented an algorithm known to Carl Friedrich Gauss around 1805 (and subsequently rediscovered several times in limited forms)."

评论 #4866217 未加载

评论 #4866336 未加载

评论 #4869890 未加载

mpyneover 12 years ago

I almost skipped because I figured it would be another introductory article to how to use bash and coreutils, but this was actually very good.

fcatalanover 12 years ago

Hits close to home. I do a lot of data conversion, arrangement and manipulation on the CLI. When some coworker inherits any of those tasks and I explain how to do it, the answer tends to be "Aaaaallright, I'll use Excel".

piqufohover 12 years ago

Up for unix and "EDA is the lingua franca of data science". What you can do and discard on the unix CLI takes many times longer on certain GUI based OSes.

nipunn1313over 12 years ago

head -3 data* | cat has the same result as head -3 data*Pipe sends stdout to stdin of the next process. cat sends stdin back to stdout. Piping to cat is rarely eventful (unless you use a flag like cat -n).

评论 #4866149 未加载

ralphover 12 years ago

He writes<pre><code> (head -5; tail -5) <data </code></pre> but that's a bit misleading. These don't work.<pre><code> seq 20 | (head -5; tail -5) (head -5; tail -5) < <(seq 20) </code></pre> Both giving just the first five lines.

评论 #4869515 未加载

评论 #4869109 未加载

keithpeterover 12 years ago

rs and lam look interesting. Are these commands really only available on BSD (i.e. 'proper' Unix derivatives)? Hoping for Linux compilable code.

评论 #4865864 未加载

mturmonover 12 years ago

"describe" is a nice idea. Just knowing the range, the mean, and the second moment can be helpful.

_hnwoover 12 years ago

I'd be interested in what Seth's setup/theme/os of choice is .. :)

10 comments

frankcover 12 years ago

评论 #4867758 未加载

评论 #4867821 未加载

lutuspover 12 years ago

评论 #4866217 未加载

评论 #4866336 未加载

评论 #4869890 未加载

mpyneover 12 years ago

I almost skipped because I figured it would be another introductory article to how to use bash and coreutils, but this was actually very good.

fcatalanover 12 years ago

piqufohover 12 years ago

Up for unix and "EDA is the lingua franca of data science". What you can do and discard on the unix CLI takes many times longer on certain GUI based OSes.

nipunn1313over 12 years ago

评论 #4866149 未加载

ralphover 12 years ago

评论 #4869515 未加载

评论 #4869109 未加载

keithpeterover 12 years ago

rs and lam look interesting. Are these commands really only available on BSD (i.e. 'proper' Unix derivatives)? Hoping for Linux compilable code.

评论 #4865864 未加载

mturmonover 12 years ago

"describe" is a nice idea. Just knowing the range, the mean, and the second moment can be helpful.

_hnwoover 12 years ago

I'd be interested in what Seth's setup/theme/os of choice is .. :)

Explorations in Unix

10 comments

Explorations in Unix

10 comments