About a year ago, I wrote a blog post about command-line tools for data science [1]. Thanks to HN, I received a lot of valuable comments and pointers to other great command-line tools! In the past 10 months, I have been writing a book titled Data Science at the Command Line [2]. Ever since that blog post, I've been discovering new tools. On the one hand, that's quite frustrating because it's difficult to keep up and include everything in the book. On the other hand, it's fantastic to see that the command line is still very popular!<p>In order to gain a better overview of what's available, I thought it'd be nice to ask on HN what your favorite tools are to work with data. Many new tools have been developed in the past year, but your favorite one may just be 10 years old. You may think that I'm too late with this question because the book is already finished, but fortunately the book also discusses the underlying concepts which haven't changed too much in the past forty years.<p>I'm very much looking forward to hearing about your favorite command-line tools. Bonus points if you reply in CSV format "command,url,reason\n", so I can easily scrape the comments :)<p>Thanks!<p>PS. For those who are interested, next Wednesday, I'll be doing a webcast about this topic [3], where I might share the outcome of this discussion.<p>[1] https://news.ycombinator.com/item?id=6412190<p>[2] http://shop.oreilly.com/product/0636920032823.do<p>[3] http://www.oreilly.com/pub/e/3115
GNU sed 4.2.1 awk 3.0.4 and grep 2.4.2, <a href="http://www.git-scm.com/" rel="nofollow">http://www.git-scm.com/</a> , Bundled with Windows Git (which needs an updated find)<p>Python with pandas, <a href="http://pandas.pydata.org/" rel="nofollow">http://pandas.pydata.org/</a> , If I need HDF5 or time series<p>ffmpeg, ffmpeg.org, If I'm generating animations<p>* I look forward to your book :)
jq,<a href="http://stedolan.github.io/jq/,best" rel="nofollow">http://stedolan.github.io/jq/,best</a> tool to handle JSON files in command line
#The tool that immediately comes to my mind is jq, a tool to transform and process JSON objects. It's one of those powerful tools that is super easy to learn and once I started using it I just couldn't live without. The only negative thing I have to say is that it does not have good native support to transform between JSON and CSV.
After all your command-line data munging (possibly in a Unix pipeline), if you want to convert the resulting text to PDF (without leaving the command line :-), check this post:<p>[xtopdf] PDFWriter can create PDF from standard input:<p><a href="http://jugad2.blogspot.in/2013/12/xtopdf-pdfwriter-can-create-pdf-from.html" rel="nofollow">http://jugad2.blogspot.in/2013/12/xtopdf-pdfwriter-can-creat...</a><p>It needs xtopdf and ReportLab (use v1.17) and Python (use 2.2 or higher).<p>Online overview of xtopdf: <a href="http://slid.es/vasudevram/xtopdf" rel="nofollow">http://slid.es/vasudevram/xtopdf</a><p>xtopdf on Bitbucket:<p><a href="https://bitbucket.org/vasudevram/xtopdf" rel="nofollow">https://bitbucket.org/vasudevram/xtopdf</a>
CSVKit: <a href="https://github.com/onyxfish/csvkit" rel="nofollow">https://github.com/onyxfish/csvkit</a>
and
The R Project for Statistical Computing: <a href="http://www.r-project.org/" rel="nofollow">http://www.r-project.org/</a>
Jeroen, I'm just reading your [1] for the first time now. Are you aware of Dirk Eddelbuettel's `littler`? I believe that might overlap with your Rio tool to some degree.
histogram, <a href="https://github.com/ole-tange/tangetools/blob/master/histogram/histogram" rel="nofollow">https://github.com/ole-tange/tangetools/blob/master/histogra...</a>, So you got this table and you are not really in the mood of firing up GNUplot/a spreadsheet/R but you would like a quick bar chart here in the terminal. cat data | histogram
txr, <a href="http://nongnu.org.txr" rel="nofollow">http://nongnu.org.txr</a>, Use it all the time and like it a lot! That keeps me interested in working on it. Started five years and and still at it today, more than 1500 commits later, and 27000 LOC.
jq has proven useful for dealing with JSON. A nice way to reduce or reformat your data.<p><a href="http://stedolan.github.io/jq/" rel="nofollow">http://stedolan.github.io/jq/</a>