Every time I see things like this, I feel like the person must be unaware of awk.<p><pre><code> # the original one-liner to get unique IP addresses
cut -d' ' -f 1 access.log | sort | uniq -c | sort -rn | head
# turns into this with GNU awk
gawk '{PROCINFO["sorted_in"] = "@val_num_desc"; a[$1]++} END {c=0; for (i in a) if (c++ < 10) print a[i], i}' access.log
</code></pre>
It's also far, far faster on larger files (base-spec M1 Air):<p><pre><code> $ wc -lc fake_log.txt
1000000 218433264 fake_log.txt
$ hyperfine "gawk '{PROCINFO[\"sorted_in\"] = \"@val_num_desc\"; a[\$1]++} END {c=0; for (i in a) if (c++ <10) print a[i], i}' fake_log.txt"
Benchmark 1: gawk '{PROCINFO["sorted_in"] = "@val_num_desc"; a[$1]++} END {c=0; for (i in a) if (c++ <10) print a[i], i}' fake_log.txt
Time (mean ± σ): 1.250 s ± 0.003 s [User: 1.185 s, System: 0.061 s]
Range (min … max): 1.246 s … 1.254 s 10 runs
$ hyperfine "cut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head"
Benchmark 1: cut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head
Time (mean ± σ): 4.844 s ± 0.020 s [User: 5.367 s, System: 0.087 s]
Range (min … max): 4.817 s … 4.873 s 10 runs
</code></pre>
Interestingly, GNU cut is significantly faster than BSD cut on the M1:<p><pre><code> $ hyperfine "gcut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head"
Benchmark 1: gcut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn | head
Time (mean ± σ): 3.622 s ± 0.004 s [User: 4.149 s, System: 0.078 s]
Range (min … max): 3.616 s … 3.629 s 10 runs</code></pre>
I don't do a lot of shell scripting type things in Go because it's not a great language for it, but when I do, I take another approach, which is just to panic. Generics offer a nice little<p><pre><code> func Must[T any](x T, err error) T {
if err != nil {
panic(err)
}
return x
}
</code></pre>
which you can wrap around any standard "x, err :=" function to just make it panic, and even prior to generics you could wrap a "PanicOnErr(justReturnsErr())".<p>In the event that you want to handle errors in some other manner, you trivially can, and you're not limited to just the pipeline design patterns, which are cool in some ways, but limiting when that's all you have. (It can also be tricky to ensure the pipeline is written in a way that doesn't generate a ton of memory traffic with intermediate arrays; I haven't checked to see what the library they show does.) Presumably if I'm writing this in Go I have some other reason for wanting to do that, like having some non-trivial concurrency desire (using concurrency to handle a newline-delimited JSON file was my major use case, doing non-trivial though not terribly extensive work on the JSON).<p>While this may make some people freak, IMHO the real point of "errors as values" is not to force you to handle the errors in some very particular manner, but to make you <i>think</i> about the errors more deeply than a conventional exceptions-based program typically does. As such, it is perfectly legal and moral to think about your error handling and decide that what you really want is the entire program to terminate on the first error. Obviously this is not the correct solution for my API server blasting out tens of thousands of highly heterogeneous calls per second, but for a shell script it is quite often the correct answer. As something I have thought about and chosen deliberately, it's fine.
If you're not familiar with Go there is one detail missing from this post (though it's in the script README) - what a complete program looks like. Here's the example from <a href="https://github.com/bitfield/script#a-realistic-use-case">https://github.com/bitfield/script#a-realistic-use-case</a><p><pre><code> package main
import (
"github.com/bitfield/script"
)
func main() {
script.Stdin().Column(1).Freq().First(10).Stdout()
}</code></pre>
Inspired by comments in this thread, I threw together a Bash script that lets you do this:<p><pre><code> cat file.txt | ./goscript.sh -c 'script.Stdin().Column(1).Freq().First(10).Stdout()'
</code></pre>
You can also use it as a shebang line to write self-contained scripts.<p>Details here: <a href="https://til.simonwillison.net/bash/go-script" rel="nofollow noreferrer">https://til.simonwillison.net/bash/go-script</a>
I like Go, but its insistence on not permitting unused imports and unused variables make it unsuitable for scripting, imo.<p>For scripting I want something that I can be fast and messy in. Go is the opposite of that.<p>It's ok, a language doesn't have to be good at everything.
From Sanjay Ghemawat, 9 years ago<p><a href="https://github.com/ghemawat/stream">https://github.com/ghemawat/stream</a>
Shell scripting is quite fine up until certain complexity (say 500-1000 lines), after which adding even a single small feature becomes a huge drag. We're talking hours for something that would take me 10 minutes in Golang and 15 in Rust.<p>Many people love to smirk and say "just learn bash properly, duh" but that's missing the point that we never do big projects in bash so our muscle memory of bash is always kind of shallow. And by "we" I mean "a lot of programmers"; I am not stupid, but I have to learn bash's intricacies every time almost from scratch and that's not productive. It's very normal for things to slip up from your memory when you're not using them regularly. To make this even more annoying, nobody will pay me to work exclusively with bash for 3 months until it gets etched deep into my memory. So there's that too.<p>I view OP as a good reminder that maybe universal-ish tools to get most of what we need from shell scripting exist even today but we aren't giving them enough attention and energy and we don't make them mainstream. Though it doesn't help that Golang doesn't automatically fetch dependencies when you just do `go run random_script.go`: <a href="https://github.com/golang/go/issues/36513">https://github.com/golang/go/issues/36513</a><p>I am not fixating on Golang in particular. But IMO <i>next_bash_or_something</i> should be due Soon™. It's not a huge problem to install a single program when provisioning a new VM or container either so I am not sure why are people so averse to it.<p>So yeah, nice article. I like the direction.<p>EDIT: I know about nushell, oilshell and fish but admittedly never gave them a chance.
This is satire, right? I think commenters are completely missing the point.<p><a href="https://en.m.wikipedia.org/wiki/A_Modest_Proposal" rel="nofollow noreferrer">https://en.m.wikipedia.org/wiki/A_Modest_Proposal</a>
The unix philosophy of having small programs that take in input, process it and return a result has proven to a success, I just never understood why the next logical step of having this program in library form never became a thing. I guess shells are a bit useful but not as useful as a decent repl (common-lisp or the jupyter repl) where these programs can be used as if they were a function.
I ended up using this for my cli scripting needs. <a href="https://github.com/google/zx">https://github.com/google/zx</a>
Would love to use more golang- amazing build system and cross compiler built in. "All in one" binaries are the best thing ever. I adore most of the ideas in the language.<p>.... but there are just soooo many little annoyances / inconveniences which turn me off.<p>- No Optional Parameters. No Named Parameters. Throw us a bone Rob Pike, it's 2023. Type inferred composite literals may be an OK compromise.. if we ever see them: <a href="https://github.com/golang/go/issues/12854">https://github.com/golang/go/issues/12854</a><p>- Unused import = will not compile. Unused variable = Will not compile. Give us the ability to turn off the warning.<p>- No null safe or nullish coalescing operator. (? in rust, ?? in php, etc.)<p>- Verbosity of if err != nil { return err; }<p>- A ternary operator would be nice, and could bring if err != nil to 1 line.<p>- No double declarations. “no new variables on left side of :=” .. For some odd reason “err” is OK here... Would be highly convenient for pipelines, so each result doesn't need to be uniquely named.<p>I'd describe Go as a "simple" language- Not an "easy" language. 1-2 lines in Python is going to be 5-10 lines in golang.<p>Note: Nim has most of these..
<p><pre><code> export LC_ALL=C
awk '!a[$1]++' access.log|head
</code></pre>
If access.log is large enough, awk will fail.<p>When this happens, one can split access.log into pieces, process separately then recombine.<p>But that's more or less what sort(1) does with large files, creating temporary files in $TMPDIR or other user-specified directory after -T if using GNU sort.<p>There was a way to eliminate duplicate lines from an unordered list using k/q, without using temporary files but I stopped using it after Kx, Inc. was sold off and I started using musl exclusively. q requires glibc.<p>For example, something like<p><pre><code> #!/bin/sh
# usage: $0 file
echo "k).Q.fs[l:0::\`:$1];l:?:l;\`:$1 0:l"|exec q >null;
</code></pre>
Can this be done in ngn k.<p>The other approach I use to avoid temporary files is to just put the list in an SQL database, add a UNIQUE constraint, and update the database.
I put together a go "sh-bang" line so you can just chmod +x your .go file and run it (and it works with go fmt unlike other options).<p><pre><code> /*usr/bin/env go run "$0" "$@"; exit $? #*/
</code></pre>
It's fun try it out! Just make this the first line of the file.
I have been thinking that JS template literals could be a great replacement for shell programming, allowing you to make more powerful syntax to emulate a lot of bash useful things while still having a lot of a proper programming language power<p>for example:<p><pre><code> import { jsh, cat, grep, PipeOutput } from 'jsh;
// type PipeOutput = { stdout: ReadableStream, toString: () => Promise<string>, extra: Record<string,any> }
function countLines(input: PipeOutput, argv: string[]): PipeOutput {
// ...
}
const textToLookFor = process.argv[1]
const output: PipeOutput = jsh`${cat} file.txt | ${grep} ${textToLookFor} |
${countLines}`
console.log(output.toString())</code></pre>
Tangentially related: I posted a shebang for scripting in rust some years ago, if anyone is interested: <a href="https://neosmart.net/blog/self-compiling-rust-code/" rel="nofollow noreferrer">https://neosmart.net/blog/self-compiling-rust-code/</a>
Interesting. I do something similar with my task <a href="https://github.com/kardianos/task">https://github.com/kardianos/task</a> package, which is in tern loosely based off of another package from 10-15 years ago.
Discussed at the time:<p><i>Scripting with Go</i> - <a href="https://news.ycombinator.com/item?id=30641883">https://news.ycombinator.com/item?id=30641883</a> - March 2022 (66 comments)