1. Overview

As we know, there are convenient command-line utilities to report running processes in Linux, such as the commonly used ps, top, and htop commands. These commands can list all running processes or the processes that belong to the current user.

In this tutorial, we’ll explore how to find processes’ information, such as PIDs or the complete command line starting the process, and so on, by a keyword.

2. Introduction to the Problem

As we know, Linux operating system assigns each process a unique PID. However, after a process has been started, we often don’t know its PID. So, when required, we need to find the process, usually by some keywords, such as the application name, the command that started the process, or some significant parameters.

Depending on what we’ll do with the process, we may want to get different levels of the process details. For example, if we’re going to find a particular process and kill it, knowing the process’s PID is probably sufficient. However, if we want to check the resource usage of a running process, we may need more than just its PID.

In this tutorial, we’ll discuss how to find a process by keywords using a few commonly used utilities. Further, we’ll learn what process information each approach can give us.

To make it easier to discuss, let’s create a couple of simple shell scripts to simulate two applications.

2.1. Creating Two Simple Shell Scripts

Let’s say we have two enterprise applications: powerHR and smartMarketing. For simplicity, each application is a shell script:

$ tree -f /tmp/test                      
/tmp/test
├── /tmp/test/powerHR
│   └── /tmp/test/powerHR/start.sh
└── /tmp/test/smartMarketing
    └── /tmp/test/smartMarketing/start-marketing.sh

2 directories, 2 files

As we can see in the tree output above, each application has its own directory with an executable shell script.

Well, to simulate the application’s running, we make the “applications” do nothing but sleep for a relatively long time:

$ head **/*.sh 
==> powerHR/start.sh <==
#!/bin/bash
echo "Starting the powerHR platform...."
echo "powerHR platform is running..."
sleep 3600

==> smartMarketing/start-marketing.sh <==
#!/bin/bash
echo "Starting smartMarketing system..."
echo "smartMarketing system is running"
sleep 3600

Next, let’s start the two applications in two terminals and see how to find the processes.

3. Using the pidof Command

As its name implies, the pidof command can find the PID of a running program. First, of course, pidof needs to know what to find. Here, we pass the program name to the pidof command. A program name can be a command, script, or executable. For example, we know on the *nix systems, the init process has the PID 1, so we can use pidof to verify it:

$ pidof init
1

Next, let’s try to find the process of the powerHR application (start.sh):

$ pidof start.sh
$

Surprisingly, the pidof doesn’t print anything this time. This is because start.sh is a shell script. It’s executed by /bin/bash, as defined in the shebang. So, for the pidof command, the program running start.sh is /bin/bash. However, we don’t want to run ‘pidof bash‘, as all current running scripts and shells will be listed. To solve this problem, we can pass the -x option to the pidof command to make it support reporting PID of named scripts:

$ pidof -x start.sh
2455359

We can verify whether this PID is the one we’re looking for:

$ ps -fp 2455359
UID          PID    PPID  C STIME TTY          TIME CMD
kent     2455359 2455167  0 20:11 pts/16   00:00:00 /bin/bash /tmp/test/powerHR/start.sh

The pidof command is pretty straightforward. But it reports the PID only. If we want more information about the process, we need to pass the found PID to other commands, such as ps, as shown above.

Furthermore, we must give the exact script name or program name. pidof doesn’t support glob or other pattern matching features:

$ pidof -x st*.sh
$ pidof -x start
$ pidof ini*

As the examples above show, pidof won’t work if we don’t give the exact program name.

4. Using the pgrep Command

We can understand the pgrep command as “process-grep.” Like grep, pgrep supports regex and is a handy utility to help us find processes.

4.1. Plain pgrep

By default, pgrep matches only the program name and outputs PIDs of found processes. Let’s see some examples:

$ pgrep start.sh 
2455359

$ pgrep start
2455359
2494597

In the first example, we pass the script name start.sh to pgrep. Therefore, pgrep finds the powerHR process. However, in the second command, we give the “start” keyword to the pgrep command. As *pgrep treats “*start” as a regex**, we’ve found two processes. If we check their detailed information, we’ll see that both powerHR and smartMarketing processes have been found:

$ ps -fp 2455359 2494597
UID          PID    PPID  C STIME TTY      STAT   TIME CMD
kent     2455359 2455167  0 23:49 pts/16   S+     0:00 /bin/bash /tmp/test/powerHR/start.sh
kent     2494597 2455106  0 23:49 pts/14   S+     0:00 /bin/bash /tmp/test/smartMarketing/start-marketing.sh

The plain pgrep command supports regex. Thus, it’s more convenient than the pidof command. Next, let’s learn a few commonly used options to make it even more convenient.

4.2. The -f-a, and -i Options

We’ve learned that pgrep matches only the program name by default. The -f option tells pgrep to match the full command line. In other words, the pgrep command will also match the program’s path and arguments.

As we’ve started the two shell scripts with their absolute paths, if we use pgrep with the -f option, we can search them by the application names (directory names):

$ pgrep -f powerHR
2455359

$ pgrep -f smartMarketing
2494597

In the real world, this could be pretty useful. For example, if we start some Java applications, all program names are likely the same: “*$JAVA_HOME/bin/java*“. However, the expected names could be in the arguments. For instance, let’s look at the full command line of an IntelliJ IDE process:

/usr/lib/jvm/default/bin/java -classpath /home/kent/javaEnv/intellij/lib/util.jar...(many other jars and arguments)... -Dsplash=true com.intellij.idea.Main

With the -f option, we can search those Java applications with their real application names, such as pgrep -f intellij.

Moreover, if we pass pgrep the -a Option, it’ll print the found PIDs together with the complete command lines of the processes. This makes it easier to verify if the found processes are really what we’re looking for.

So next, let’s test the pgrep command with both -a and -f options to find our smartMarketing process:

$ pgrep -af Marketing
2494597 /bin/bash /tmp/test/smartMarketing/start-marketing.sh

Additionally, we can ask pgrep to perform a case-insensitive match with the -i option:

$ pgrep -af powerhr  
$  # <-- no process found

$ pgrep -afi powerhr
2455359 /bin/bash /tmp/test/powerHR/start.sh

5. Filtering ps Output With Other Commands

Using pgrep with the -a option, we can find the process’s PID and the full command line. However, if we need more information, such as its parent process ID or CPU usage, pgrep cannot report this information.

Actually, as we verified the results of pidof and pgrep, we’ve mentioned the ps command. So next, we’ll address approaches of piping ps‘s result to other commands to get the desired output.

5.1. ps | grep

The grep command is a great tool for performing text matching. *In practice, we often pipe ps‘s output to the grep command to find processes.*

Next, let’s use this approach to find powerHR‘s process:

$ ps -ef | grep -i powerhr
kent     2455359 2455167  0 Sep06 pts/16   00:00:00 /bin/bash /tmp/test/powerHR/start.sh
kent     2503000   75711  0 00:37 pts/11   00:00:00 grep -i powerhr

As the output above shows, we can see the start.sh process. However, we also notice that the grep command itself appears in the output. This is because the grep process has already started when we launch the command. Thus, ps lists it in the output.

Of course, we can pipe the result to another grep command with the -v (invert-match) option to discard the grep process:

$  ps -ef | grep -i powerhr | grep -v grep     
kent     2455359 2455167  0 Sep06 pts/16   00:00:00 /bin/bash /tmp/test/powerHR/start.sh

However, we need to type two grep commands in this way. As grep works with regex, we can solve this problem with a regex trick:

$ ps -ef | grep -i "[p]owerhr"
kent     2455359 2455167  0 00:54 pts/16   00:00:00 /bin/bash /tmp/test/powerHR/start.sh

As we can see, the trick works. Now, let’s understand what’s going on.

The regex “[p]owerhr” matches the literal string “powerhr“. So the start.sh process matches. However, the grep process command line is grep -i “[p]owerhr“. We should note that this command line is from ps‘s output. In other words, it’s a literal string. Therefore, the regex “[p]owerhr” doesn’t match the literal string “[p]owerhr. Thus, the grep process gets filtered out.

5.2. ps | awk

The ps command allows us to control the output flexibly, for example, for printing memory usage, parent PID, CPU usage, and so on. However, the ps | grep approach removes the title line. This makes the result not easy to understand. For example, we don’t know which column is the CPU usage values.

awk is a powerful command-line text processing tool. Next, let’s use awk to filter ps‘s output and preserve the title line:

$ ps -ef | awk 'NR==1 || /[p]owerHR/'  
UID          PID    PPID  C STIME TTY          TIME CMD
kent     2455359 2455167  0 00:54 pts/16   00:00:00 /bin/bash /tmp/test/powerHR/start.sh

As the output above shows, the title line is in the output. It makes the output in a table-like structure and thus easier to understand.

6. Conclusion

In this article, we’ve learned three ways to find processes by keywords:

Appproach

Summary

pidof

only matches the program name

uses the -x option if searching scripts

outputs PIDs only  

pgrep

supports regex search

supports full command line match

outputs PIDs and full command line  

ps -ef | grep (or awk)

pretty flexible to control the output

regex trick to exclude the “grep/awk” process: ps … |grep “[p]attern”