Awk: The Power and Promise of a 40-Year-Old Language

251 点作者 jangid超过 3 年前

22 条评论

asicsp超过 3 年前

HN discussion threads for some of the links mentioned in the article:* Using AWK and R to parse 25TB - <a href="https://news.ycombinator.com/item?id=20293579" rel="nofollow">https://news.ycombinator.com/item?id=20293579</a>* Command-line Tools can be 235x Faster than a Hadoop Cluster - <a href="https://news.ycombinator.com/item?id=17135841" rel="nofollow">https://news.ycombinator.com/item?id=17135841</a>* The State of the AWK - <a href="https://news.ycombinator.com/item?id=23240800" rel="nofollow">https://news.ycombinator.com/item?id=23240800</a>For awk alternative implementations, I'm keeping an eye on frawk [0]. Aims to be faster, supports csv, etc.[0] <a href="https://github.com/ezrosent/frawk" rel="nofollow">https://github.com/ezrosent/frawk</a>

评论 #28449078 未加载

vyuh超过 3 年前

"A good programmer uses the most powerful tool to do a job. A great programmer uses the least powerful tool that does the job." I believe this, and I always try to find the combination of simple and lightweight tools which does the job at hand correctly.Awk sometimes proves surprisingly powerful. Just look at the concision of this awk one liner doing a fairly complex job:<pre><code> zcat large.log.gz | awk '{print $0 | "gzip -v9c > large.log-"$1"_"$2".gz"}' # Breakup compressed log by syslog date and recompress. #awksome </code></pre> Taken from: <a href="https://mobile.twitter.com/climagic/status/614153897230397440" rel="nofollow">https://mobile.twitter.com/climagic/status/61415389723039744...</a>

评论 #28447321 未加载

评论 #28447467 未加载

评论 #28450310 未加载

phkahler超过 3 年前

I never use Awk until last year. I wanted to monitor an embedded device with little more than bustbox and python on it. There was quite a bit of information in the log files (I had already written a custom log file viewer with some highlighting) but I wanted to monitor in real-time. Somehow I decided to use Awk to monitor the tail of the log file and do realtime bar-graphs by generating appropriate cursor control sequences. In the end I had about 50 lines of Awk to upload to the board and run a command to pipe the log into it - very minimally invasive and very informative.Would recommend learning Awk with some kind of real-world use of your own. BTW it reminded me of using XSLT which I think is another often overlooked "good thing".

评论 #28447825 未加载

评论 #28448406 未加载

zeveb超过 3 年前

> Very few people still code with the legacies of the 1970s: ML, Pascal, Scheme, Smalltalk.Arguably, the software world would be better off if more people did code with those 1970s languages, than with the ones we are stuck with now.And that applies to Awk, too. As the author quotes Neil Ormos stating, Awk is well suited for personal computing, something which we have gotten further and further from at the same time as computers have become more distributed. At what point in history have such a large fraction of the human race had the ability to calculate to such an amazing order of magnitude, and at what point in history have such a large fraction of the same human race not bothered with calculation?Awk is a great tool precisely because it puts quite a lot of expressive power in the hands of an average user on a Unix system. Sure, on a Lisp machine or Smalltalk machine there really isn't the same need for Awk: the systems languages on such machines are safe enough and expressive enough to do what Awk does. But in the Unix context — which is basically what we're all living in, with even the VMS-derived Windows more-or-less adhering to the Unix model — Awk is a godsend.edit: correct typo

评论 #28450497 未加载

tyingq超过 3 年前

Gawk's ability to extend it with C code is interesting as well, and pretty straightforward.Here's the source for the fork() extension that ships with gawk...it's ~150 lines or so: <a href="https://git.savannah.gnu.org/cgit/gawk.git/tree/extension/fork.c" rel="nofollow">https://git.savannah.gnu.org/cgit/gawk.git/tree/extension/fo...</a>I was able to make a (terrible/joke/but-it-kinda-works) web server with gawk using the extensions that ship with it: <a href="https://gist.github.com/willurd/5720255#gistcomment-3143007" rel="nofollow">https://gist.github.com/willurd/5720255#gistcomment-3143007</a>

评论 #28446622 未加载

dugmartin超过 3 年前

My first and only real use of awk was around 1995. I was working at a new job doing embedded software work at GE and we had a lot of documentation in SGML, written/viewed using Interleaf. Interleaf was super slow on the HP-UX workstations we had and iirc search was even slower. I got the idea to convert all the SGML files into a single HTML file and I reached for awk as I had used it for some one-liners previously. I ended up writing an awk script that generated a frameset with one sidebar frame that was a treeish table of contents and the other frame the mondo html file with anchors for the table of contents. It loaded pretty fast in the HP-UX browser and search was really fast.

dekhn超过 3 年前

I've used Python almost my entire career, but started with out the UNIX tools. I never found awk interesting, then took a peek at it recently and understood: this was the pre-perl! it had scripting-language hash tables!

评论 #28447225 未加载

zeteo超过 3 年前

My company mandates Windows but Git Bash has been a backdoor into Unix tools and I've recently learned sed and awk to take full advantage of it. You need to think a bit about your one liners and they'll always feel very hacky, but sed/awk (with a bit of sort thrown in) are an amazingly powerful combination for dealing with all sorts of messy data dumps. In 10 minutes I can craft a one liner that replaces a 2 hours C# console app and runs just as fast. And, surprisingly, I often find it easier to go back months later and understand the messy looking one liner than the nicely formatted, well commented, unit tested console app.

jrochkind1超过 3 年前

My first job getting paid to program was in awk. Processing log files.In the middle of that job, my supervsior, you know what, we're doing increasingly complicated things with awk and it's getting increasingly hacky... I've heard that Perl is like awk but better, do you want to learn Perl and switch to that?And so we did. My thought then was there was little that was easier in awk than Perl, you could use Perl very much like awk if you wanted, you can even use the right command-line args to have Perl have an "implied loop" like awk... but then you can do a lot more with Perl too.I don't use Perl anymore. Or awk.

评论 #28447578 未加载

melling超过 3 年前

i no longer use it but Perl was always the better solution when one thought AWK was the answer.Perl will do those things where AWK really shines and if the problem got bigger, Perl was easier to deal with.

评论 #28445762 未加载

评论 #28445060 未加载

评论 #28447303 未加载

评论 #28445589 未加载

arendtio超过 3 年前

Learning awk is actually pretty simple. For years I just used the '{print $2}' version to extract fields, but after reading some short book I felt pretty confident of having understood the basics.Sadly I don't remember which book it was, but this page looks like a good start: <a href="https://ferd.ca/awk-in-20-minutes.html" rel="nofollow">https://ferd.ca/awk-in-20-minutes.html</a>

评论 #28449016 未加载

ketanmaheshwari超过 3 年前

My own shameless plug: <a href="https://ketancmaheshwari.github.io/posts/2020/05/24/SMC18-Data-Challenge-4.html" rel="nofollow">https://ketancmaheshwari.github.io/posts/2020/05/24/SMC18-Da...</a>

nesuse超过 3 年前

There's a free awk course here for anyone interested <a href="https://www.udemy.com/course/awk-tutorial/" rel="nofollow">https://www.udemy.com/course/awk-tutorial/</a>

cb321超过 3 年前

When you have a standardized problem setting like the implicit loop in awk, n alternative to a whole new programming language is a simple < 100 lines of code program generator [1].This design lets you retain easy access to large sets of pre-existing libraries as well as have a "compiled/statically typed" situation, if you want. It also leverages familiarity with your existing programming languages. I adapted a similar small program like this to emit a C program, but anything else is obviously pretty easy. Easy is good. Familiar is good.Interactivity-wise, with a TinyC/tcc fast running compiler backend my `rp` programs run sub-second from ENTER to completion on small data. Even with not optimizing tcc, they they still run faster than byte-compiled/VM interpreted mawk/gawk on a per input-byte basis. If you take the time to do an optimized build with gcc -O3/etc., they can run much faster.And I leave the source code around if you want to just use the program generator as a way to save keystrokes/get a fast start on a row processing program.Anyway, I'm not trying to start a language holy war, but just exhibit how if you rotate the problem (or your head looking at the problem) ever so slightly another answer exists in this space and is quite easy. :-)[1] <a href="https://github.com/c-blake/cligen/blob/master/examples/rp.nim" rel="nofollow">https://github.com/c-blake/cligen/blob/master/examples/rp.ni...</a>

torcete超过 3 年前

I use awk constantly in bioinformatics, for many of the file formats designed to store genomic data, awk is the easiest tool you can use for processing.

评论 #28447761 未加载

评论 #28449225 未加载

linuxlizard超过 3 年前

I use awk to auto-generate C header files from other header files. I work with $vendor's huge complicated kernel driver codebase. I need small pieces of $vendor's interconnected header files in order to make kernel calls to their drivers without pulling in all their code.

justin_oaks超过 3 年前

I only recently learned Awk enough to be useful. But I still don't reach for it when I probably should.What are the most common cases where you reach for Awk instead of some other tools?I recently used it to parse and recombine data from the OpenVPN status file. That file has a few differently formatted tables in the same file. Using Awk, I was able to change a variable as each table was encountered, this I could change the Awk program behavior by which table it was operating on.

评论 #28445807 未加载

评论 #28446117 未加载

评论 #28446183 未加载

评论 #28445742 未加载

评论 #28448058 未加载

评论 #28446046 未加载

评论 #28445595 未加载

gompertz超过 3 年前

And let's not forget about the amazing commercial offering of Awk, known as Tawk (by Thompson Automation). To this day some features from Tawk cannot be found in Gawk.

评论 #28491035 未加载

mukundesh超过 3 年前

awk is great for data analysis - usually, I start with cut, then move to awk as complexity increases and finally to python.

SjorsVG超过 3 年前

I find it very unpleasant to read Awk code. It looks as bad as regex to me.

forinti超过 3 年前

sed is pretty ancient too. I've used it a lot with Docker to alter parameters during builds.

shp0ngle超过 3 年前

awk is fast and really useful.It's also generally unreadable.

评论 #28445784 未加载