I highly recommend ShellCheck[0] if you're writing any bash. With the warnings and stylistic advice it provides, I feel like I can actually be confident that my scripts are doing what I think they're doing.<p>[0]: <a href="https://github.com/koalaman/shellcheck" rel="nofollow">https://github.com/koalaman/shellcheck</a>
This doesn't mention one of my favorite bash gotchas, which is using `set -e` with `pipefail` at all. Try this:<p><pre><code> set -euo pipefail
yes | head
</code></pre>
That will consistently exit because `yes` gets sigpipe and quits. Which is expected, but triggers a script exit. But more exciting is that something like:<p><pre><code> generate_data | head
</code></pre>
only _sometimes_ fail. It's a race that depends on whether generate_data is able to stuff all of its data into the pipe buffer before head calls close().<p>EDIT: I seemed to remember sharing this bug not too long ago, and indeed I did. pixelbeat responded with some interesting links: <a href="https://news.ycombinator.com/item?id=13940628" rel="nofollow">https://news.ycombinator.com/item?id=13940628</a>
As an aside, I like reading Jane Street's blog. They're one of the few companies in our space (I'm also in "automated trading") that discusses even non-proprietary stuff openly. When you work in an information vacuum, it's comforting to know that presumably similar people face the same challenges.<p>When the author wrote "a particular production bash script (if that doesn't sound horrifying, hopefully it will by the end of this post)," I couldn't help but smile...
The error handling is one reason I prefer Tcl to Bash or POSIX sh if I need a nontrivial script that shells out to other programs. Tcl handles errors, substitution, etc. properly (unsurprisingly) by default:<p><pre><code> > set foo [exec /bin/true][exec /bin/false][exec /bin/true]
child process exited abnormally
while executing
"exec /bin/false"
> exec true | false | true
child process exited abnormally
while executing
"exec true | false | true"
> set sp "hello world"; exec echo $sp; # No need to quote $sp there.
hello world
</code></pre>
Not all shell-like things are as convenient to do in Tcl as they are in sh (the most significant difference to me is that you cannot pipe to or from functions), and it is more verbose, but because everything is a string in Tcl I find that it integrates with *nix (or Windows!) command line programs better than other scripting languages. E.g.,<p><pre><code> > lmap x [split [exec ps | tail -n +2] \n] {lindex $x 0}
4540 5767 5768 31161
</code></pre>
What happens here is that you take the output of `ps | tail -n +2`, split it on newlines, then map over it treating each line (that is something like "5814 pts/0 00:00:00 ps") as a list and taking the first element in it. The result is a list of PIDs (a string containing the PIDs separated by whitespace).<p>I can recommend trying Tcl to anyone fighting Bash who doesn't want to replace it with Python/Ruby/etc. If you try it, though, use version 8.6 or at the very least 8.5. The previous versions are EOL but are still common in the wild. If a recent Tcl is not available on your system, you can build a self-contained static binary interpreter with <a href="http://kitcreator.rkeene.org/kitcreator" rel="nofollow">http://kitcreator.rkeene.org/kitcreator</a>.
Not that people should be expected to know this, but here's an idiom that does work:<p><pre><code> if res="$(ldap-query-for-valid-users)"; then
echo "($res)" > "/tmp/all-users.sexp"
else
handle_failure
fi
</code></pre>
I second the recommendation to use <a href="https://github.com/koalaman/shellcheck" rel="nofollow">https://github.com/koalaman/shellcheck</a> – you really shouldn't be writing shell scripts without it – but in this case it doesn't seem to handle the issue (with default settings at least).
Slightly more useful than `set -e` is `set -E` and `trap exit ERR`.<p>But still there's basically no way to make this consistently useful.<p>Even if you religiously set -e/E in every scope just in case, if you're anywhere in a scope inside the non-final operand of a bunch of &&/||s, or as the conditional expression in a control structure or whatever, -e/E will just do nothing, you can't turn it any more on, you just don't get early termination on errors no matter how many nested function calls you're actually removed from the original ||. It's not great.
I was just thinking how cool would it be to have a module in python to really ease generating bash style scripts, with less overhead than normal `subprocess` methods have... And there was it just a Google search away: <a href="https://pypi.python.org/pypi/sh/1.12.13" rel="nofollow">https://pypi.python.org/pypi/sh/1.12.13</a>
I was in a meeting with a major vendor and a bunch of fintech leads recently and major vendor's techie said '...no-one likes bash scripts...'<p>After the meeting I said to colleagues, 'I quite like bash scripts, actually', and they all said 'I thought that too...'
Another fun one is that<p><pre><code> set -e
export x=$(false)
echo ok
</code></pre>
prints ok, but<p><pre><code> set -e
export x
x=$(false)
echo ok
</code></pre>
exits early because of the `false`.
<p><pre><code> echo ... > "/tmp/all-users.sexp"
</code></pre>
No, this is not a secure way to create temporary files.<p>Please use mktemp(1).
What this really boils down to is you should avoid subshells where possible and handle this with either variable assignment or and streams (pipes).<p>eg:<p>set -euo pipefail<p>foo() {<p><pre><code> false
echo "hello world"
}
</code></pre>
variable=$(foo) # fails<p>foo | do_stuff # fails<p>My preference is to handle things as streams
Yes, subshells can be tricky. Don't rely on stuff like pipefail, rather check return codes and react accordingly. Bash also provides a way to read return codes of piped commands. See PIPESTATUS in man bash.
<p><pre><code> echo ($(ldap-query-for-valid-users)) > /tmp/all-users.sexp
</code></pre>
should be something like<p><pre><code> x=$(ldap-query-for-valid-users);
test ${#x} -gt 0||exec echo no valid users >&2;
echo \("$x"\) > /tmp/all-users.sexp;
</code></pre>
This way they would get the message "no valid users" to stderr and the script would exit. According to the blog post that is what they wanted.<p>Alternatively,<p><pre><code> x=$(ldap-query-for-valid-users);
test ${#x} -gt 0||exit 100
echo \("$x"\) > /tmp/all-users.sexp;
</code></pre>
if they prefer a nonzero exit code to a message to stderr.
A related annoyance is that when you write a for-loop in a Makefile, every iteration gets run even if one of them fails, due to how the shell calculates exit codes of for-loops.<p>So most of the time you should do "set -e; for XXX". Otherwise your Makefile loops will "succeed" incorrectly.
How about just writing safe code.<p>Really... Check your damn return values. set -e is a crutch of a sloppy programmer.<p>Ok, yes; you can do this really slick thing in one line by stringing together a bunch of commands. However, just because you can does not mean you should.<p>Bash makes it simply with 'if ! <command>; then <failure commands> fi'. Try not to string ten things together. Keep conditional true state in your scripts execution flow.
I use Xonsh[0] for scripting now. I explain why in my blog[1], basically because we are in 2017 and I like python.<p>[0]: <a href="http://xon.sh/" rel="nofollow">http://xon.sh/</a><p>[1]: <a href="https://william-droz.com/xonsh-a-modern-shell-that-enable-python-in-your-terminal.html" rel="nofollow">https://william-droz.com/xonsh-a-modern-shell-that-enable-py...</a>
I don't think the problem is (ba)sh the language, but rather the idioms it implements.<p>It's usually a bad idea to begin writing the result before knowing that all input is there. That's like a server announcing a 200 OK and beginning to stream, only later detecting I/O error. It's difficult to deal with such a server as a client.<p>More generally, in pipelines we deal with pairs of programs that are only connected by a text stream, with no possibility to communicate out-of-band conditions. In `PRODUCER | CONSUMER`, PRODUCER can't tell from a sigpipe whether CONSUMER crashed or has read all the data it needs. And CONSUMER can't know whether PRODUCER crashed or if it should take action on the results.<p>Most scripts don't really need the take-action-immediately level of concurrency. It's sometimes nice to be concurrent (use multiple CPUs at once). But the cases where the processed data can't be buffered at least in a temporary file before taking action are really rare.
Some more edge cases : <a href="http://mywiki.wooledge.org/BashFAQ/105" rel="nofollow">http://mywiki.wooledge.org/BashFAQ/105</a>
When I see very long bash scripts I get angry. There is a reason why perl & later python were invented and they are both in 98% of cases available on Unix compatible systems.<p>Not that I'm not ok with short bash scripts or for systems were you only have bash... but if your Unix compatible system only has bash there is something else wrong with your system in my point of view.
Yeah, naively using errexit won't get you the whole way. BashFAQ, ShellCheck and StackOverflow have quite a bit of information, and you can often write things that are pretty robust if you take in all the information. Particular cases that come to mind are:<p>1) command substitution. you can catch this with an explicit error handler inherited by child processes.
2) pipefail SIGPIPE false positives. pretty hairy, command dependent whether this is a "real" error. often not, so you can work around it by ignoring SIGPIPE
3) process substitution. As far as I know, there is no way to workaround this whilst still using the convenience syntax. You have to carefully use explicit named pipes and carefully use wait on the PIDs (carefully!). Maybe still with races...<p>In my experience, you can write moderately robust shell scripts if you care enough and use all these flags and linters. But by the time you are at this stage you probably shouldn't be using shell scripting. More like training to spot problems in other people's code.
This works but it gets ugly when you have to use 'set -e' everywhere:<p>set -e<p><pre><code> foo() {
/bin/false
echo "foo"
}
echo "$(set -e; foo)"</code></pre>
The use of global values within programs is generally deprecated, for good reasons. The use of global values that modify the semantics of its interpreter probably deserves even more skepticism.<p>The inheritance of these flags by subshells, which may have been written assuming different semantics, would be potentially even more problematic, so I think bash is right in this case, though arguably the inheritance could be limited to subshell code defined in the same file.
set -euo pipefail
should NEVER be used in a bash script. If you feel tempted to do so, then you are trying to make the shell scripting language be something that it is not.<p>This is the time for a full scale programming language such as Python or perhaps Groovy on the JVM or Go language. When you need to write robust code, use the tools that were created for writing robust code.<p>Bash just has too many quirks.<p>Note that this is related to the most common way that people build a Big Ball of Mud. You have a simple app and you need a couple of features so you add them on. Rinse and repeat. Before too long you have an app that does too much and was never designed/architected to do that much stuff. You are probably ignoring a number of techniques for integrating functionality in large apps such as message queueing, microservices, separate libraries or packages, multiple languages.<p>Shell scripts suffer the same trajectory towards too much complexity. When you see it happening, and before the task gets too complex, replace the script with an app and apply all the normal software engineering techniques to make it robust.
Use sh -c<p>-c Read commands from the command_string operand instead of from the standard input. Special parameter 0 will be set from the command_name operand and the positional parameters ($1, $2, etc.) set from the remaining argument operands.