Errors happen. Even if we write a perfect program, it might be provided with invalid input, or unexpectedly disconnected from the network. The key to writing reliable software isn't to prevent all errors, but to ensure that all errors are handled in a predictable way.
Bash does not enforce precise error handling. In fact, by default it simply ignores most errors. However, with a bit of care it is possible to write robust, reliable scripts that keep you and your users happy. Here are some error handling practices to keep in mind.
What do we mean by errors?
Bash doesn't have exceptions or error types as we might be used to in other languages. However, every command, whether it's built-in to Bash or an external program, returns an "exit status code" between 0
and 255
when it finishes executing. Successful commands return 0
, while commands that fail return a code between 1
and 255
.
When I talk about "errors" in Bash in this post, I'm referring to any command which exits with a non-zero exit code in a context where it isn't explicitly expected. For example, if you had a program that started with
cat example.txt
and example.txt
did not exist, that would be an error. Nothing is handling the failure, so the program would either crash or continue in an invalid state. However if you have an if
statement like
if test -e example.txt; then
echo "Example found"
else
echo "Example not found"
fi
the command test -e example.txt
may fail, but the if
statement is expecting its condition to be a command that might fail, and it handles that case automatically. I do not consider that an "error" for the purpose of this post. The same reasoning applies to cases like while COMMAND; do ...
and COMMAND || return 0
; see the Bash manual for the full list of exceptions.
Simple errors
By default, Bash scripts will ignore most errors and continue running. The first thing we need to do in our scripts is enable Bash's basic error handling options, as follows.
set -euo pipefail
Here we enabling three options at once. Let's break them down.
set -e
(aka -o errexit
) causes most failing commands to immediately return from the enclosing function, propagating their error exit status code to the calling function. If the calling function also doesn't handle the error, it will continue up the stack, eventually exiting the script with that exit status code. (Note that there are still some cases where errors can be silently ignored, discussed below.)
set -u
(aka -o nounset
) makes it an error to refer to a variable like $X
if it hasn't been defined, either in the script or as an environment variable, instead of treating it as an empty string. Often, this is a typo and a bug. There are certainly some cases where you'll need to handle possibly-undefined variables, but they should be indicated explicitly: you can use ${X-}
instead of $X
to indicate where you'd like to use an empty string if a variable isn't defined.
set -o pipefail
prevents errors from being silently ignored in pipelines (when the output of one command is being piped to the input of another). For example, consider:
cat example.txt | grep metadata | sort
By default, the exit status of the entire pipeline will just be that of the last command, sort
. This can succeed even if example.txt
does not exist and an earlier command like cat
fails. pipefail
changes this behaviour so that the pipeline is marked as failed if any of the commands fail. (Subsequent commands in the pipeline will still be executed. If multiple fail, the exit status of the last failing command will be used.)
Setting set -euo pipefail
is a very common practice for many shells, but for Bash in particular there's one more option you should also be setting:
shopt -s inherit_errexit
By default, when you use command substitution in Bash, the -e
setting is not applied. For example, if we had
echo "Command output: $(false; date)"
the command would successfully output the result of date
, even though the failing false
command should have exited the script first. Enabling inherit_errexit
allows the command substitution to inherit our -e
setting, so date
will not be run. (However, please note that the error status of the command substitution is still ignored. Even though the parenthesized expression returned a nonzero exit status, echo
will still run successfully. This is discussed in more detail below.)
ShellCheck
Adopting those settings made my scripts much more reliable, but I was still finding some bugs in them. Many came from me misunderstanding subtleties of Bash's syntax, where my code wasn't doing what I thought it was doing. I might forget which terms need quoting in a condition like [[ $x -eq "$y" ]]
, or where I can and can't omit the $
before a variable in an expression like $(( x = y ))
. I tried to keep the rules straight, but there were too many to absorb at once and it felt hopeless, until I discovered ShellCheck.
ShellCheck is a static analysis tool/linter for Bash scripts, and it is invaluable. I use it in VS Code (extension) and run it in CI. It flags cases where your code might not be doing what you expect, with links to wiki pages explaining the problem and potential alternatives.
Most of my recent Bash learnings have started with a ShellCheck warning code making me aware of an edge case or capability that I hadn't considered. Like any linter, you may occasionally need to ignore its warnings with an annotation like # shellcheck disable=SC2034
, but I've found its advice is usually very good, even when it seemed counterintuitive at first.
Even with ShellCheck, there are still some subtle cases where you can silence errors without realizing it, but not many.
Subshells
A lot of things about Bash have surprised me, but this was the most shocking: when you use parentheses to group commands, Bash forks the entire process to create a "subshell" child process running the parenthesized code!
(false || true || echo this is a subshell) && ls
echo "$(ls also-this)" "$(ls this-too)"
my_function() (
echo this is a subshell
)
other_function() {
echo but this is NOT, because I used curly braces instead of round parentheses
}
This is why if you try to set a global variable from inside of parentheses, the change won't be visible outside: you're only setting the value in the child process.
declare x=first
(x=second)
echo "$x" # echoes "first"
This usually doesn't cause a problem for error handling—our settings are propagated to the subshell, and the exit status of the subshell is propagated back. However, there is one major exception...
The unfortunate case of command substitution
Even with every available setting enabled, failures in command substitution subshells are usually silenced/masked and do not cause a failure in the original shell process. For example:
set -euo pipefail
shopt -s inherit_errexit
echo "$(
echo >&2 "error: everything is broken"
exit 66
)"
echo "but this is still running"
error: everything is broken
but this is still running
As far as I can tell, there is no way to change this behaviour, and ShellCheck can't warn about it. If there is some way I've missed, please let me know! There are workarounds, but they're clunky.
The exit status of these subshells is returned to the parent shell, however, it's never checked before it is overwritten by the return status of the original command (echo
in the case above). If we put the command substitution in an assignment expression on its own, instead of as an argument to another command, the exit status won't be overwritten. For example:
set -euo pipefail
shopt -s inherit_errexit
declare output
output="$(
echo >&2 "error: everything is broken"
exit 66
)"
echo "$output"
error: everything is broken
This will behave properly, with the failure in output="$(...)"
exiting the script.
Bonus suggestions
The default handling of glob expressions in Bash is confusing. Consider the command
ls ./builds/bin-*/
If builds
contains one or more directories whose names start with bin-
, you'll get an argument for each, expanding to something like this:
ls ./builds/bin-windows/ ./builds/bin-linux/
However, if there's no match, the glob expression isn't replaced, it's just passed to the command as-is, typically producing an error or unexpected behaviour:
ls: cannot access './builds/bin-*/': No such file or directory
There are two more-reasonable alternative behaviours, and I strongly suggest you set one of them: shopt -s nullglob
will replace the glob expression with the empty string if it doesn't match, and shopt -s failglob
will raise an error.
Finally, you should set shopt -s compat"${BASH_COMPAT=42}"
with the minimum Bash version you want to support, to reduce the chance of breakage in later versions. For Linux I usually target 42
(February 2011) but macOS only ships with 32
(October 2006).
Conclusion
Writing robust Bash scripts is tricky, but not impossible. Start your scripts with set -euo pipefail; shopt -s inherit_errexit nullglob compat"${BASH_COMPAT=42}"
and use ShellCheck, and you'll be 90% of the way there!
Appendix 1: Bash manual description of the -e
/-o errexit
setting
Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a
while
oruntil
keyword, part of the test following theif
orelif
reserved words, part of any command executed in a&&
or||
list except the command following the final&&
or||
, any command in a pipeline but the last, or if the command's return value is being inverted with!
. If a compound command other than a subshell returns a non-zero status because a command failed while-e
was being ignored, the shell does not exit. A trap onERR
, if set, is executed before the shell exits. This option applies to the shell environment and each subshell environment separately (see COMMAND EXECUTION ENVIRONMENT above), and may cause subshells to exit before executing all the commands in the subshell.If a compound command or shell function executes in a context where
-e
is being ignored, none of the commands executed within the compound command or function body will be affected by the-e
setting, even if-e
is set and a command returns a failure status. If a compound command or shell function sets-e
while executing in a context where-e
is ignored, that setting will not have any effect until the compound command or the command containing the function call completes.source:
{ COLUMNS=2048 man bash | grep -Em1 -A32 '^\s+set \[' | grep -Em1 -A32 '^\s+-e\s{4}' | grep -Em2 -B32 '^\s+-.\s{4}' | sed '$d' | grep -EoA32 '\s{4}(\S\s{0,4})+$' | grep -Eo '\S.*$' | fmt -tw$COLUMNS; }
Top comments (1)
Thank you very much for a great post!
I found following still doesn't exit with the setting suggested in the post.
I think it is because the process substitution (
<(cat notFound)
) creates a sub-shell but it doesn't involve an assignment.The example you mentioned (the command substitution without an assignment) can be avoided with an assignment to a dummy variable but I can't come up with a good workaround...