Ensuring that a shell script runs exactly once

#bash #productivity #linux #tips

Many times, we have shell scripts which perform some important stuff like inserting into database, mailing reports, etc which we want to run exactly one instance of.

Enter locks!

A simple solution is to create a "lock file" and check if the file exists when the script starts. If the file is already created, it means another instance of that program is running, so we can fail with message "Try again later!". Once the script completes running, it will clean-up and delete the lock file.

LOCK_FILE=a.lock
if [ -f "$LOCK_FILE" ]; then
    # Lock file already exists, exit the script
    echo "An instance of this script is already running"
    exit 1
fi
# Create the lock file
echo "Locked" > "$LOCK_FILE"

# Do the normal stuff

# clean-up before exit
rm "$LOCK_FILE"

This looks promising but there are issues with this approach. What happens if the script does not end correctly i.e it exits because of some failure before it reaches the clean-up part of the code? Or if it gets forcibly terminated with Ctrl+C or kill command? In both these cases, the created lock file will not be deleted. So next time you run the script, you will always get an error and will have to manually delete the file.

There is another, more subtle error with the above code. A race condition. If two instances of scripts are started around the same time, it is possible that both of them get past the if [ -f "$LOCK_FILE" ] because the second instance may reach that part of the code before the first instance is able to create the lock file. Thus, we have more than one instance running.

A better lock!

Is there a way to create a lock file which is more robust to race conditions and non-standard termination (Ctrl+C, kill command, etc)? Linux offers flock a utility to manage locks from shell scripts. Using flock, we can rewrite the above snippet as follows:

LOCK_FILE=a.lock
exec 99>"$LOCK_FILE"
flock -n 99 || exit 1
# Do stuff and exit!

The exec 99>"$LOCK_FILE" creates a file descriptor numbered 99 and assigns it to LOCK_FILE. File descriptors (fd) 0, 1, 2 are for stdin, stdout, stderr respectively. We are creating new fd with a high number to ensure that it does not clash with numbered fds opened later-on by script.

flock -n 99 || exit 1 does 2 things. Firstly, it acquires an exclusive lock on the file descriptor 99 which refers to our LOCK_FILE. This operation is guaranteed by the linux kernel to be atomic. Secondly, if it fails to acquire the lock, it exits with return code 1. We do not need to worry about any clean up. flock will automatically release the lock when the script exits regardless of how it terminates. This solves our problem!

What if I wanted to add a more informational message instead of exiting directly on failure to acquire lock? We can change the line flock -n 99 || exit 1 as follows:

flock -n 99
RC=$?
if [ "$RC" != 0 ]; then
    # Send message and exit
    echo "Already running script. Try again after sometime"
    exit 1
fi

The flock man page has an example which you can use to add an exclusive lock to start of any shell script:

[ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock -en "$0" "$0" "$@" || :

This boilerplate uses the script file itself as a lock. It works by setting an environment variable $FLOCKER to script file name and executing the script with its original parameters after acquiring the lock. On failure however, it does not print anything and silently exits.

$0 here stands for name of the script. $@ stands for all arguments passed to the script when it was called.

Use case for me

My team uses a test machine where we deploy multiple branches of a code-base. We need to make sure that exactly one person is building the project at a particular time. The deploy script pulls the specified branch of code from git and builds the project, deploys the main service and starts ancillary services. The script takes sometime to execute. If someone tries to deploy another branch while a build is ongoing, both can fail.

With the above snippet, calling the script more than once shows the current branch being built and exits with failure.

Top comments (3)

Michael R • Apr 23 '20

flock -n 99
RC=$?
if [ "$RC" != 0 ]; then
    # Send message and exit
    echo "Already running script. Try again after sometime"
    exit 1
fi

This snippet does not work for me, I never fall into the condition RC different from 0 so I never see "Already running script" message, although look had correctly been acquired, any idea why ?