确保只有一个Shell脚本实例在运行

1. Introduction

In this tutorial, we’ll discuss different ways of ensuring only one instance of a bash script is running. This will be useful when our script needs to have exclusive access to some resource. One common example is when we need to run a cron job only if it is not already running.

We can prevent more than one instance in several ways, which can be divided into two main groups. One approach is using a lock with flock or lockfile. The other one is determining if the process is already running, for instance, using a pid file.

2. Using flock

We can use flock to create a lock on a file. The idea is that we first try to obtain the lock, and if this fails, it means there’s another instance running.

We can be confident with this approach because there won’t be race conditions. Also, any lock held on a file is released once the process exits. Those advantages make flock a safe way of ensuring only one instance will run. Another advantage is that the flock program is an implementation of the flock system call.

flock by default blocks until the lock is released and then continues without error. We can use the parameter -n to use flock in a non-blocking way. This will make flock immediately exit with an error when there is another lock on the file.

We should use -n if we don’t want the script to run a second time in the event that it’s already running. On the other hand, if we don’t use -n, flock will block, and the script will run again once the previous instance terminates.

We’ll need to choose a file that will serve as the lock. This file needs to be unique to our script and not shared by other processes.

2.1. Executing an External Script

There are two ways we can use flock. One of them is to have the script in a separate file and use flock to call it.

Using flock this way, we don’t need to modify the script. This is useful when we want to protect any arbitrary script or binary.

We only need the path to the lock file and the path to the script:

$ flock -n <lock file> <script>

Let’s use flock when executing an external script called dobackup.sh, using the file /var/lock/dobackup.lock as the lock:

$ flock -n /var/lock/dobackup.lock ./dobackup.sh

Now, suppose that our script is currently running. Let’s see what happens if we run the above line again:

$ flock --verbose -n /var/lock/dobackup.lock ./dobackup.sh
flock: failed to get lock
$ echo $?
1

We can see that flock has informed us that it failed to acquire the lock and exited with value 1 (error). This means another instance has the lock.

When flock fails it doesn’t run the script parameter, preventing more than one dobackup.sh instance from executing.

2.2. Using flock Inside the Script

The other way we can use flock is to add it inside our script. In this case, we call flock with a file descriptor:

$ flock -n <file descriptor>

To use flock this way, we surround everything we need to protect in round brackets (a sub-shell) and redirect it to the file we use as the lock. We call flock at the beginning with the file descriptor used in the redirection. Then, if flock exits with an error, we know that there’s another instance running.

Once that sub-shell terminates, the lock file is closed and the lock is automatically released.

Now, let’s take a look at what our script dobackup.sh does:

#!/bin/bash
DEST=/home/backup/`date +%s`
mkdir -p "$DEST"
rsync -avz root@myhost:/home/web "$DEST/."

Then, let’s add flock inside the script:

#!/bin/bash
another_instance()
{
    echo "There is another instance running, exiting"
    exit 1
}
( flock -n 100 || another_instance; DEST=/home/backup/`date +%s`; mkdir -p "$DEST"; rsync -avz root@myhost:/home/web "$DEST/." ) 100>/var/lock/dobackup.lock

In this example, we use the file descriptor 100 in the redirection to the lock file. Also, we call another_instance if flock fails, informing there is another instance and then exiting.

3. Using lockfile

lockfile is provided by procmail, and we can use it in our script in a similar way as we did with flock. If we don’t have it on our system, we can install it with the package manager.

With lockfile, we specify a file that will be used as a lock. If the file already exists, lockfile will exit with an error. So, this will mean there is another instance running. If instead lockfile finishes successfully, it means there is no other instance and we can proceed with our script.

We have to choose a file that the script will use as the lock. This file can not exist before running our script, as lockfile interprets this as something is locking on the file. Also, this file needs to be unique to our script and not shared by other processes. And finally, we remove the file when the script exits, which can be achieved with trap.

There are three parameters that can be useful depending on the script and the situation:

-l timeout: establishes a timeout, in seconds, after which the lock file will be forcibly removed
-r retries: specifies how many times to retry if it fails to acquire the lock on the first try; a value of -1 means to retry forever
-sleeptime: specifies the number of seconds it will sleep before retrying

There could be cases where the file is not removed. We can use the -l timeout parameter to execute the script if we consider the file is too old.

3.1. Example

Now, we can modify our previous dobackup.sh script to use lockfile:

#!/bin/bash
LOCK=/var/lock/dobackup.lock
remove_lock()
{
    rm -f "$LOCK"
}
another_instance()
{
    echo "There is another instance running, exiting"
    exit 1
}
lockfile -r 0 -l 3600 "$LOCK" || another_instance
trap remove_lock EXIT
DEST=/home/backup/`date +%s`
mkdir -p "$DEST"
rsync -avz root@myhost:/home/web $DEST/.

This way, we call lockfile before executing anything else and exit if lockfile fails.

We specified not to retry if the lock fails. If we wanted, we could’ve used -r 5 to retry five times to acquire the lock before exiting. Also, we used trap to ensure that whenever the script exits, remove_lock will be called. And finally, we specified a 3600-second timeout to the lock, so the script is executed if there is a stale lock.

4. Writing the PID to a File

We also have the alternative to use a pid file. The idea is to write the PID of our process to a file after checking it doesn’t exist yet. Also, if the file exists, we can see if the PID inside the file is currently running using the command kill -0 $PID.

4.1. Example

Let’s use a PID file with our dobackup.sh example:

#!/bin/bash
PIDFILE=/var/lock/dobackup.pid
remove_pidfile()
{
  rm -f "$PIDFILE"
}
another_instance()
{
  echo "There is another instance running, exiting"
  exit 1
}
if [ -f "$PIDFILE" ]; then
  kill -0 "$(cat $PIDFILE)" && another_instance
fi
trap remove_pidfile EXIT
echo $$ > "$PIDFILE"
DEST=/home/backup/`date +%s`
mkdir -p "$DEST"
rsync -avz root@myhost:/home/web $DEST/.

In this case, we used /var/run/dobackup.pid to store the pid.

We use trap to remove the PID file once the script finishes. We have to consider that trap won’t work in all scenarios, like a SIGKILL.

4.2. Disadvantages

If we’re using this method, we have to take into account that there can be a race condition. Two processes can be executing the if statement at the same time, without any pid file created yet.

Also, there can be a valid PID assigned to a different program in a stale PID file.

5. Using a Directory

The other option we have is to use a directory to indicate there is an instance running. We only have to create a directory using mkdir. If the directory already exists, mkdir fails, meaning there is another instance. But if it exits successfully, the lock directory is created and we can run the script.

This method has the advantage that we aren’t checking for the existence of the lock-in one step and then creating the lock-in another step. The two-step lock is prone to race conditions. Instead, with mkdir, we can do it all in one step.

5.1. Example

To see this in action, let’s edit dobackup.sh to use a directory:

#!/bin/bash
LOCK=/var/lock/dobackup.lock
remove_lock()
{
    rm -rf "$LOCK"
}
another_instance()
{
    echo "There is another instance running, exiting"
    exit 1
}
mkdir "$LOCK" || another_instance
trap remove_lock EXIT
DEST=/home/backup/`date +%s`
mkdir -p "$DEST"
rsync -avz root@myhost:/home/web $DEST/.

5.2. Disadvantages

We should take care of old directories created but without the corresponding process running. Even though we use trap, certain signals like SIGKILL or a power loss can leave a stale directory.

In this case, we don’t have information about the PID that created the directory as we did when using the pid file. This means we can’t check whether the process is still alive or not.

6. Searching for the Process

And finally, we can search for the process to see if it is running or not. We may use pgrep or lsof to achieve this.

If we use this method, we don’t need any extra files.

6.1. Using pgrep

We can use pgrep to search for the name of our script and execute it if it isn’t found. One way of doing this is outside the script in one line:

$ pgrep dobackup.sh || ./dobackup.sh

In that case, we don’t need to modify dobackup.sh.

Another option we have is to add pgrep inside the script:

#!/bin/bash
another_instance()
{
    echo "There is another instance running, exiting"
    exit 1
}
if [ "$(pgrep dobackup.sh)" != $$ ]; then
     another_instance
fi
DEST=/home/backup/`date +%s`
mkdir -p "$DEST"
rsync -avz root@myhost:/home/web $DEST/.

When using pgrep inside the script, we’ll always have at least one instance. If there is only one (the script itself), pgrep will just print the current PID. If there are more instances, all PIDs will be printed and the output will differ from our current PID.

6.2. Using lsof

With lsof, we check if the script is opened by any process. If it is open, we interpret this as another instance running.

Similar to the pgrep example, we can check for this outside the script or inside. Let’s try it first outside the script without modifying the script:

$ lsof dobackup.sh || ./dobackup.sh

As an alternative, we can use lsof inside the script:

#!/bin/bash
another_instance()
{
echo "There is another instance running, exiting"
exit 1
}
INSTANCES=`lsof -t "$0" | wc -l` 
if [ "$INSTANCES" -gt 1 ]; then
    another_instance
fi
DEST=/home/backup/`date +%s` 
mkdir -p "$DEST" 
rsync -avz root@myhost:/home/web $DEST/.

As we’re running lsof inside the script, there will always be at least one instance running. We used the parameter -t so lsof doesn’t print any header, just the open files.

We also used $0 to make the script more portable. This way, we don’t have to worry about the actual script name in case it changes.

6.3. Disadvantages

When using these methods, we can encounter false-positives and false-negatives.

It’s possible that the script is opened by another process, like a file editor. Also, it could be that pgrep finds another process with a similar name to our script. Both these cases lead to false-positives, thus preventing our script from running.

And we also have to consider false-negatives. If the script is renamed, we’ll be searching for the old name unless we update the script. And when we use lsof, it won’t find another instance if there is a copy of the script running.

7. Conclusion

In this article, we saw several ways of ensuring only one instance of a bash script is running.

We first used flock and lockfile as a way of determining if there is another instance. Then we also used a PID file, a directory, pgrep, and lsof, although these methods aren’t as safe as flock and lockfile.

Persistence

REST

Security