TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

CLI text processing with GNU awk

419 pointsby asicspover 1 year ago

19 comments

mplanchardover 1 year ago
I love awk, and I find myself reaching for it a fair bit. One of the main things I use it for is “sed with state,” so for things like matching on a line, but only if it was preceded by some other line. I find this to be really useful for creating one-off linters, for example I made one recently to check all our migration files for CREATE INDEX without CONCURRENTLY on a particular set of very large tables where it would cause issues. Since sql statements can be spread over multiple lines, it was difficult to write a straightforward match, but awk can track state like “I’m in a create statement,” “I’m creating an index,” etc. across multiple lines, which allowed me to cobble together something that has worked well for about a year now.
评论 #37296434 未加载
评论 #37296152 未加载
sworesover 1 year ago
I suspect that anyone reading this thread is likely to be equally interested in &quot;Ask HN: Share a shell script you like&quot; from a fortnight ago (though at 78 comments, it didn&#x27;t get as much traction &#x2F; comments as I hoped it would when I saw it)<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37112991">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37112991</a>
评论 #37292261 未加载
kazinatorover 1 year ago
I maintain a minor side interest in Awk, along side Lisp and other things.<p>I developed cppawk in 2022: <a href="https:&#x2F;&#x2F;www.kylheku.com&#x2F;cgit&#x2F;cppawk&#x2F;about&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.kylheku.com&#x2F;cgit&#x2F;cppawk&#x2F;about&#x2F;</a><p>cppawk extends Awk with preprocessing.<p>There is a loop macro that supports a vocabularly of clauses. Clauses can be combined for parallel and cross-product iteration. And they are user-extensible. By writing five simple macros, you can define a new clause.<p>Something potentially useful if you use Awk.<p>Cppawk is documented with multiple man pages, and covered by unit tests which run with gawk and mawk.
e63f67dd-065bover 1 year ago
Perhaps my old sysadmin hat is showing through, but I don’t quite see what the advantage of awk is over just writing the same thing in perl. I’ve seen my fair share of horrendous shell scripts from junior sysadmins, and every time I think to myself “the text processing portion would be so much cleaner in Perl”.
评论 #37293935 未加载
评论 #37293091 未加载
评论 #37296682 未加载
评论 #37293245 未加载
评论 #37296346 未加载
评论 #37292967 未加载
评论 #37292737 未加载
tyingqover 1 year ago
One somewhat not-well-known thing with gawk is that it typically ships with some useful extensions that give you access to things like readdir(), ord(), chr(), gettimeofday(), sleep(), etc.<p><a href="https:&#x2F;&#x2F;www.gnu.org&#x2F;software&#x2F;gawk&#x2F;manual&#x2F;html_node&#x2F;Extension-Samples.html" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.gnu.org&#x2F;software&#x2F;gawk&#x2F;manual&#x2F;html_node&#x2F;Extension...</a>
nologic01over 1 year ago
awk one-liners are a slam dunk. The tough question whether to invest in more complex awk programming. Invariably some processing task requires more complex logic and awk provides that, but in the terse and arcane ways of early computing. Yet reaching for a modern alternative is also an overhead, may not be particularly intuitive either (hello pandas) and may even have performance issues...
评论 #37295399 未加载
评论 #37296207 未加载
asicspover 1 year ago
Hello! Author here.<p>I am pleased to announce a new version of my &quot;CLI text processing with GNU awk&quot; ebook.<p>Learn the `GNU awk` command step-by-step from beginner to advanced levels with hundreds of examples and exercises. This book will dive deep into field processing, show examples for filtering features, multiple file processing, how to construct solutions that depend on multiple records, how to compare records and fields between two or more files, how to identify duplicates while maintaining input order and so on. Regular Expressions will also be discussed in detail.<p>Links:<p>* PDF&#x2F;EPUB versions: <a href="https:&#x2F;&#x2F;learnbyexample.gumroad.com&#x2F;l&#x2F;gnu_awk" rel="nofollow noreferrer">https:&#x2F;&#x2F;learnbyexample.gumroad.com&#x2F;l&#x2F;gnu_awk</a> (free till 31-August-2023)<p>* Web version: <a href="https:&#x2F;&#x2F;learnbyexample.github.io&#x2F;learn_gnuawk&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;learnbyexample.github.io&#x2F;learn_gnuawk&#x2F;</a><p>* Markdown source, example files, etc: <a href="https:&#x2F;&#x2F;github.com&#x2F;learnbyexample&#x2F;learn_gnuawk">https:&#x2F;&#x2F;github.com&#x2F;learnbyexample&#x2F;learn_gnuawk</a><p>* Interactive TUI app for exercises: <a href="https:&#x2F;&#x2F;github.com&#x2F;learnbyexample&#x2F;TUI-apps&#x2F;blob&#x2F;main&#x2F;AwkExercises">https:&#x2F;&#x2F;github.com&#x2F;learnbyexample&#x2F;TUI-apps&#x2F;blob&#x2F;main&#x2F;AwkExer...</a><p>Bundle offers:<p>* Magical one-liners (<a href="https:&#x2F;&#x2F;learnbyexample.gumroad.com&#x2F;l&#x2F;oneliners&#x2F;new_awk_release" rel="nofollow noreferrer">https:&#x2F;&#x2F;learnbyexample.gumroad.com&#x2F;l&#x2F;oneliners&#x2F;new_awk_relea...</a>) is $5 (normal price $15) — grep, sed, awk, perl and ruby one-liners bundle<p>* All Books Bundle (<a href="https:&#x2F;&#x2F;learnbyexample.gumroad.com&#x2F;l&#x2F;all-books&#x2F;new_awk_release" rel="nofollow noreferrer">https:&#x2F;&#x2F;learnbyexample.gumroad.com&#x2F;l&#x2F;all-books&#x2F;new_awk_relea...</a>) is $12 (normal price $32) — all my 13 programming ebooks<p>I would highly appreciate it if you&#x27;d let me know how you felt about this book. It could be anything from a simple thank you, pointing out a typo, mistakes in code snippets, which aspects of the book worked for you (or didn&#x27;t!) and so on. Reader feedback is essential and especially so for self-published authors. Happy learning :)<p>---<p>Previous discussions:<p>* Learn to use Awk with hundreds of examples (<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=15549318">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=15549318</a>) — <i>478 points, Oct 2017, 116 comments</i><p>* Show HN: An eBook with hundreds of GNU Awk one-liners (<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22758217">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22758217</a>) — <i>539 points, April 2020, 48 comments</i>
评论 #37291832 未加载
评论 #37291288 未加载
auselenover 1 year ago
I did this golfing a while back: Drawing a heart with AWK - <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;auselen&#x2F;906a53b47a7d616b080dbef85eb8f776" rel="nofollow noreferrer">https:&#x2F;&#x2F;gist.github.com&#x2F;auselen&#x2F;906a53b47a7d616b080dbef85eb8...</a>
Galanweover 1 year ago
99.9% of my awk use case is to split a line (a la &quot;cut - d\ - f) while discarding successive spaces.<p>e.g.:<p><pre><code> $ echo &quot;key: value&quot; | awk &#x27;{print $1}&#x27; value </code></pre> Open to a simpler replacement :-)
评论 #37293842 未加载
评论 #37293968 未加载
评论 #37294175 未加载
评论 #37295533 未加载
rottc0ddover 1 year ago
confession: plug<p>I once wrote a diff2html script ported from bash and it was much, much faster (for obvious reasons). And awk makes it much more readable than bash script. And I could learn the language, debug, understand bugs and fix them in a night.<p>Not sure, if it is idiomatic way to awk, but have to say it is a really nice language.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;berry-thawson&#x2F;diff2html&#x2F;blob&#x2F;master&#x2F;diff2html.sh">https:&#x2F;&#x2F;github.com&#x2F;berry-thawson&#x2F;diff2html&#x2F;blob&#x2F;master&#x2F;diff2...</a>
healeycodesover 1 year ago
Awesome! I&#x27;ve been meaning to replace my usage of Python&#x2F;JavaScript for tasks (which I believe) are more awk-shaped.
评论 #37296407 未加载
mpalmerover 1 year ago
A few years back I decided to just get as capable as I could with jq, which is fast and functional enough to cover 99% of awk&#x2F;sed use cases, plus cases you&#x27;d never want to touch with awk&#x2F;sed.<p>No regrets!
cb321over 1 year ago
Of possible interest - instead of making a whole new programming language like awk, you can also just systematize generating code for an existing one with a command-line harness.<p>This can even stay terse &amp; keep a fairly fast edit-test turnaround in a fully statically typed language like Nim: <a href="https:&#x2F;&#x2F;github.com&#x2F;c-blake&#x2F;bu&#x2F;blob&#x2F;main&#x2F;doc&#x2F;rp.md">https:&#x2F;&#x2F;github.com&#x2F;c-blake&#x2F;bu&#x2F;blob&#x2F;main&#x2F;doc&#x2F;rp.md</a>
评论 #37294116 未加载
评论 #37296303 未加载
评论 #37293326 未加载
ra1231963over 1 year ago
Never learned awk or committed esoteric cli incantations to memory. Don’t get me wrong, I can get around on the cli, but sed, awk, etc just didn’t seem like a good cost&#x2F;benefit investment. I’m also not a sysadmin.<p>Thankfully I waited long enough and LLMs can write them for me better than I ever could.
ymgchover 1 year ago
What is better? Starting with awk or sed?
评论 #37291336 未加载
评论 #37291340 未加载
评论 #37291366 未加载
评论 #37296802 未加载
评论 #37318459 未加载
评论 #37292375 未加载
评论 #37292654 未加载
评论 #37294587 未加载
评论 #37296703 未加载
thangngoc89over 1 year ago
I have been using ChatGPT for generating these kind of small CLI like this. My prompts look like this:<p><pre><code> - use jq to count a nested array &quot;a.b.c.d&quot; - find and delete empty folders using `find` - find and replace text using sed&#x2F;awk </code></pre> I found that using ChatGPT for these purposes boosted my productivity tremendously.
评论 #37291463 未加载
评论 #37294197 未加载
评论 #37291173 未加载
qalmakkaover 1 year ago
Awk is fine and dandy but, like wity Sed, I think that it&#x27;s almost always replaceable with Perl which is way nicer to use, and ubiquitous. Every OS (except Windows) I laid my hands on in the last 15 years has had Perl installed in either its default install or pulled in as a dependency almost immediately (a LOT of stuff depends on Perl in any Unix system).<p>This is, unless you are running on an embedded environment, but in that case you are stuck with something like busybox&#x27;s Awk which is way more limited than gawk...
评论 #37292781 未加载
评论 #37291578 未加载
评论 #37292237 未加载
评论 #37293618 未加载
评论 #37292404 未加载
评论 #37291431 未加载
评论 #37291571 未加载
评论 #37292140 未加载
rubicksover 1 year ago
I love awk. Enough to shill for this:<p><a href="https:&#x2F;&#x2F;www.oreilly.com&#x2F;library&#x2F;view&#x2F;effective-awk-programming&#x2F;9781491904930&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.oreilly.com&#x2F;library&#x2F;view&#x2F;effective-awk-programmi...</a><p>If TFA is an excerpt for a book forthcoming on dead-tree media, then I&#x27;ll be buying that one as well.
评论 #37297709 未加载
ycombineteover 1 year ago
Last week chat-gpt spat out some Awk for me for a generic linux request. Was quite a pleasant surprise!
评论 #37290817 未加载