1. Introduction

Dealing with processes is a necessity when administrating a Linux system. Tasks range from checking process resource usage, through streamlining running applications and services, to preventing memory leaks and beyond.

In this tutorial, we look into ways of killing a process that keeps restarting. First, we do a brief refresher on process hierarchy. Next, we define persistent processes and how they come to be. After that, we check out ways to detect and identify such processes. Finally, we explore ways to deal with a process that keeps restarting.

We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.

2. Process Hierarchy

In Linux, each process is a fork of another. Because of this, there is one process, which is a parent to all: PID 1, usually called init or, more recently, systemd.

Conversely, parenting continues downstream, meaning each process has its own direct parent. In fact, we can view these relationships via ps -H:

$ ps -AH
PID TTY          TIME CMD
  1 ?        00:06:56 systemd
238 ?        00:03:28   systemd-journal
269 ?        00:00:18   systemd-udevd
[...]

Showing all processes (-A) with their children (-H), we see systemd and their direct descendants. Direct means that systemd itself spawned them at one point.

3. Persistent Processes

No process can revive itself once killed. Only a different one can execute the binary of the original. Even then, that’s a different process and PID, albeit with the same code:

$ sleep 10 &
[1] 666
$ kill -9 666
$
[1]  Killed                  sleep 10
$ sleep 10 &
[1] 667

Here, we used the kill command to terminate a background job. Next, we restarted it with the same code, but it got a new process ID.

There can be many reasons for a process restarting right after it’s killed.

3.1. Watchdog Services

The concept of a watchdog is universal and easily guessable from the name. Watchdog processes monitor the system for a given event and react to it.

For example, events can be the creation of a file, resource usage around a given threshold, or the termination of a process. A very simple watchdog can be a script:

for SERVICE in ssh apache2
do
    service $SERVICE status 2>&1 >/dev/null
    [ $? -ne 0 ] && service $SERVICE restart
done

This script checks whether the given services (ssh and apache2) are up and restarts them if not. To truly make it work like a watchdog, it’s usually best to schedule it with cron.

Another typical example is Docker restart policies. They can ensure the restart of containers that stop abnormally.

Of course, both cases above also result in process restarts.

3.2. Scheduled Restarts

Sometimes we may want a service to periodically restart regardless of its state. This may include a startup procedure or a simple time-based restart.

Either way, we end up with regular process restarts.

3.3. Malicious Processes

Critically, malicious processes may have a keep-alive mechanism similar to a watchdog. Moreover, such a process can employ other means to protect itself from termination:

  • multiple instances of the same process or a separate watchdog
  • binary replication, ensuring multiple different executables exist
  • binary infection, whereby standard tool executables are replaced or infected, running the malicious code

These cases often make the detection of a restarting process much harder. In fact, the signature of the process becomes hard to pinpoint.

4. Identifying a Persistent or Haywire Process

Indeed, the first and most critical steps of any operation with a process are identification and PID acquisition. However, these can be very difficult since the process:

  • restarts with a different PID
  • if too resource-intensive, it may slow down the system and attempts at detection
  • privileges can be higher, and permissions – restricting, particularly when dealing with daemons or services
  • may restart under many names, especially when malicious
  • could be illusive if restarting very frequently

With this in mind, we should try to get through any delays and (cautiously) get higher privileges or more permissions, if necessary.

Once this is done, we can try to detect the process via top or even watch ps, both with a small refresh interval to catch short-lived processes:

top - 04:05:27 up 3 days, 12:02,  1 user,  load average: 0.55, 1.06, 1.27
Tasks: 362 total,   2 running, 290 sleeping,   0 stopped,   0 zombie
%Cpu(s): 35.8 us, 10.7 sy,  0.0 ni, 52.4 id,  0.3 wa,  0.0 hi,  0.7 si,  0.0 st
KiB Mem :  8060436 total,   150704 free,  4438276 used,  3471456 buff/cache
KiB Swap:  2097148 total,  1656152 free,   440996 used.  2557604 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
32081 baeldung+ 20   0  879676 198164 106096 S 102.6  2.5   0:10.16 firefox
  582 baeldung+ 20   0   51448   4088   3372 R  15.8  0.1   0:00.04 top
  875 message+  20   0   53120   5900   3204 S   5.3  0.1  10:10.14 dbus-daemon
    1 root      20   0  225840   7200   4720 S   0.0  0.1   4:51.28 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.20 kthreadd
    4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/0:0H
    6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 mm_percpu_wq

For example, the TIME column from the top output above can help with the recency of the process launch, while %MEM, %CPU, and others can show heavy processes. Of course, we can determine the USER as well – a hint for the privileges.

Critically, once we manage to detect and identify it, we might be able to get a stable piece of information about the offending process via ps -AH: its parent. As discussed, with one exception, for any process to exist, another one must fork it.

With this information at hand, let’s see how we can prevent the process from restarting.

5. Blocking a Persistent or Haywire Process

Now, once we have some unique information about the problematic process, we use that to either kill or prevent the process from starting.

5.1. Manual Termination

Naturally, we can try to just kill a process by hand:

$ kill -9 666

Sometimes a process won’t revive itself more than a few times or before it has run for a given period of time. The problem with this approach is that it may take time to detect whether the offending process has stopped restarting. In essence, it’s a loop of:

  1. Identify the process and its PID
  2. Terminate the process
  3. Wait
  4. Repeat

Looking through the results of top or ps can be painstaking and tedious. Doing this by hand is probably not the best approach with persistent processes.

Still, this might be the only choice with a malicious process, as its behavior may be erratic and non-regular. Further, it can infect innocent-looking binaries, making detection even harder.

5.2. Automatic Termination

Watchdogs can be used to terminate processes as well as restart them. Automating the steps we discussed in a simple script may free us from detecting and terminating by hand:

while true
do
    bppid=$(pgrep badproc)
    [ $? -eq 0 ] && kill -9 $bppid
    sleep 60
done

Here, we use several commands to monitor and manipulate the process:

  • sleep to wait
  • pgrep for checking (by name) whether the process exists
  • pkill to kill the appropriate process

In this watchdog, we use the name of the process to find and kill it. Of course, we can detect processes by resource, by port, or use other criteria as needed.

While it should work in theory, there are many issues with this approach as well:

  • hard to determine the optimal sleep time
  • pgrep and pkill may detect and kill the wrong processes
  • waste of resources in the contest between check-start and check-kill

To circumvent these problems, we can try to attack the source.

5.3. Binary Manipulation

While some are spawned scripts, many processes start with their own custom binary executable file. To find it, we can again use ps, but with the PID of our process:

$ ps 666
  PID TTY      STAT   TIME COMMAND
  666 ?        Ss     0:02 /home/baeldung/badproc

The file path is usually in the default COMMAND or CMD column (/home/baeldung/badproc). Having this information, we can do one of two things:

  • delete the executable file
  • rename the executable file

By doing so, we can effectively stop the process from being executed. Of course, this does not prevent attempts to start it.

5.4. Kill Parent Process

Unless they are malicious, persistent processes are commonly run and revived by a single parent process. Targeting that parent, we can apply any of the actions already discussed for the process being restarted:

$ ps -AH
PID TTY          TIME CMD
[...]
660 ?        00:05:56  parent-badproc
666 ?        00:01:28   badproc
[...]
$ ps 666
PID TTY      STAT   TIME COMMAND
666 ?        Ss     0:02 /home/baeldung/badproc
$ kill -9 660
$ ps 666
PID TTY STAT TIME COMMAND

Here, we see a process (badproc, 666) and its parent (parent-badproc, 660). While the parent is alive, the child process exists. However, both processes should terminate when we kill the parent.

Clearly, an important drawback is that other processes may depend on the same parent.

6. Summary

In this article, we looked at ways to kill constantly restarting processes. To enumerate, we went through manual handling, scripts, and attacking the root of the problem.

In conclusion, there are many ways to handle persistent, haywire, and malicious processes that keep restarting, but we first need to identify them and their behavior.