Ok so the title of the blog is 'Data ops in the Linux command line'. Sounds fun, but in the same way that 'paintball with blindfolds' would be fun.<p>e.g. this is one of the examples - a kindof sanity check on two fields in a tab separated file to see if one is less than the other. The fields are identified by their position (31, 33 etc)<p><pre><code> awk -F"\t" 'NR>1 && $31!="" && $33!="" && $33>$31' fish | wc -l
</code></pre>
Surely much better to just import it into a database and do the analysis in SQL. The SQL equivalent of the above would be something like:<p><pre><code> SELECT *
FROM FishSpecimenData
WHERE MinDepth > MaxDepth
AND MinDepth is not null AND MaxDepth is not null
</code></pre>
If you're worried about type conversions while importing into SQL, just import everything as a varchar. You've still got a fairly easy job to compare the numbers:<p><pre><code> SELECT *
FROM FishSpecimenData
WHERE Cast(MinDepth as int) > Cast(MaxDepth as int)
AND MinDepth is not null AND MaxDepth is not null
AND IsNumeric(MinDepth) = 1 and IsNumeric(MaxDepth) = 1
</code></pre>
edit: To be fair, on this page <a href="https://www.polydesmida.info/cookbook/index.html" rel="nofollow">https://www.polydesmida.info/cookbook/index.html</a> the author explains the rationale for using command line tools:<p><i>I'm a retired scientist and I've been mucking around with data tables for nearly 50 years. I started with printed columns on paper (and a calculator) before moving to spreadsheets and relational databases (Microsoft Access, Filemaker Pro, MySQL, SQLite). In 2012 I discovered the AWK language and realised that every processing job I'd ever done with data tables could be done faster and more simply on the command line. Since then my data tables have been stored as plain text and managed with GNU/Linux command-line tools, especially AWK</i><p>So I guess the point of the blog is to promote that approach. Fair enough.
Oh I thought the article would be about scientific fields. Not data fields. I became increasingly irritated about the pedantry before I realized that this was the topic.<p>Once the confusion lifted I could enjoy the read.