TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Explorations in Unix

239 pointsby telemachosover 12 years ago

10 comments

frankcover 12 years ago
I use unix in the same way and for the same purpose as the described in the blog, but I have come to the opinion that once you get into the describe and visualize phase, it's much easier to just drop into R. Reading in the kind of file being worked on here is often as simple as<p>foo &#60;-read.csv("foo.csv")<p>Getting summary descriptive statistics, item counts, scatter plots and histograms is often as easy as<p>summary(foo)<p>table(foo$col)<p>plot(foo$xcol, foo$ycol)<p>hist(foo$col).<p>I think that is lot simpler than a 4 or 5 command pipeline that can be mistake-prone to edit when you want to change column names or things like that. I still do these kinds of things in the shell sometimes, and I don't know if I can put my finger on when exactly I would drop into R vs write out a pipeline, but there IS a line somewhere...
评论 #4867758 未加载
评论 #4867821 未加载
lutuspover 12 years ago
A quote: "... As if this wasn't enough, he [i.e.Tukey] also <i>invented</i> what is probably the most influential algorithm of all time." (emphasis added)<p>No, Tukey did not "invent" the FFT. He rediscovered it, as did a number of others over the years since -- who else? -- Gauss first created it.<p><a href="http://en.wikipedia.org/wiki/Fast_Fourier_transform" rel="nofollow">http://en.wikipedia.org/wiki/Fast_Fourier_transform</a><p>A quote: "This method (and the general idea of an FFT) was popularized by a publication of J. W. Cooley and J. W. Tukey in 1965,[2] but it was later discovered (Heideman &#38; Burrus, 1984) that those two authors had independently re-invented an algorithm known to Carl Friedrich Gauss around 1805 (and subsequently rediscovered several times in limited forms)."
评论 #4866217 未加载
评论 #4866336 未加载
评论 #4869890 未加载
mpyneover 12 years ago
I almost skipped because I figured it would be another introductory article to how to use bash and coreutils, but this was actually very good.
fcatalanover 12 years ago
Hits close to home. I do a lot of data conversion, arrangement and manipulation on the CLI. When some coworker inherits any of those tasks and I explain how to do it, the answer tends to be "Aaaaallright, I'll use Excel".
piqufohover 12 years ago
Up for unix and "EDA is the lingua franca of data science". What you can do and discard on the unix CLI takes many times longer on certain GUI based OSes.
nipunn1313over 12 years ago
head -3 data* | cat has the same result as head -3 data*<p>Pipe sends stdout to stdin of the next process. cat sends stdin back to stdout. Piping to cat is rarely eventful (unless you use a flag like cat -n).
评论 #4866149 未加载
ralphover 12 years ago
He writes<p><pre><code> (head -5; tail -5) &#60;data </code></pre> but that's a bit misleading. These don't work.<p><pre><code> seq 20 | (head -5; tail -5) (head -5; tail -5) &#60; &#60;(seq 20) </code></pre> Both giving just the first five lines.
评论 #4869515 未加载
评论 #4869109 未加载
keithpeterover 12 years ago
rs and lam look interesting. Are these commands really only available on BSD (i.e. 'proper' Unix derivatives)? Hoping for Linux compilable code.
评论 #4865864 未加载
mturmonover 12 years ago
"describe" is a nice idea. Just knowing the range, the mean, and the second moment can be helpful.
_hnwoover 12 years ago
I'd be interested in what Seth's setup/theme/os of choice is .. :)