1. Overview
We know that the ps command is a handy utility for listing the currently running processes on the system. Also, the grep command is good at filtering text.
In this tutorial, we’ll explore how to combine the two commands to find desired processes and preserve the header line of ps‘s output.
2. Introduction to the Problem
The ps command with the -ef option can list all running processes on the system. In practice, we often don’t need the entire list. Instead, we want to check the information of particular processes. Therefore, usually, we’ll pipe ps‘s output to the grep command to do some filtering.
Next, let’s see an example. Let’s say we want to check the processes with the keyword “vim“:
$ ps -ef | grep vim
kent 774919 44545 0 00:01 pts/7 00:00:00 vim /home/kent/.vimrc
kent 775664 1 2 00:04 ? 00:00:00 gvim /tmp/test/hello.txt
kent 775798 44504 0 00:04 pts/6 00:00:00 grep vim
As the output above shows, currently, one Vim editor process is editing the file /home/kent/.vimrc. Another Gvim editor is editing /tmp/test/hello.txt. Apart from that, we can see the “grep vim” command also appears in the output. This is because the grep process has already started when we launch the “ps | grep” command.
To suppress the grep command from the output, we can use a Regex trick:
$ ps -ef | grep '[v]im'
kent 774919 44545 0 00:01 pts/7 00:00:01 vim /home/kent/.vimrc
kent 775664 1 0 00:04 ? 00:00:00 gvim /tmp/test/hello.txt
Now, we have the two desired “vim” processes in the output. However, as the head of ps output doesn’t match the pattern “vim“, grep filters the header out. For example, the default ps -ef header looks like:
UID PID PPID C STIME TTY TIME CMD
Without the header, it isn’t straightforward when we want to check processes’ detail.
Further, we know that the ps command allows us to control the output flexibly. If we ask ps to show some customized columns, the header line will certainly make the output easier to understand.
So, next, we’ll address a couple of approaches to filtering ps‘s output using grep while keeping the header line.
Also, we may have seen this solution: ps -ef | { head -1; grep [v]im; }. At the end of the tutorial, we’ll discuss what problem this approach has and why we shouldn’t use this command.
3. Piping to the sed Command
Let’s first review our requirements. We want to do a Regex-based search on the ps -ef output. Also, the header line, which is also the first line in the output, must be preserved. The ps | grep approach cannot solve the problem as the grep command can only filter the input by Regex.
The sed command is a great command-line utility to process text. It supports the “[Address] Action” action pattern. Moreover, *both line numbers and Regex patterns can be sed‘s* Address.
Next, let’s pipe the output of ps -ef to sed:
$ ps -ef | sed -n '1p; /[v]im/p'
UID PID PPID C STIME TTY TIME CMD
kent 774919 44545 0 00:01 pts/7 00:00:03 vim /home/kent/.vimrc
kent 775664 1 0 00:04 ? 00:00:03 gvim /tmp/test/hello.txt
As the output above shows, we’ve got the desired output.
Next, let’s walk through the sed command quickly to understand how it works:
- sed -n — Disable sed‘s auto-print; we’ll control the output manually
- 1p — Print the first line
- /[v]im/p — Print the line that matches the given Regex /[v]im/
4. Piping to the awk Command
The awk command is another powerful text processing tool. Similarly, it can also control the output by Regex and line number:
$ ps -ef | awk 'NR == 1 || /[v]im/'
UID PID PPID C STIME TTY TIME CMD
kent 774919 44545 0 00:01 pts/7 00:00:07 vim /home/kent/.vimrc
kent 775664 1 0 00:04 ? 00:00:08 gvim /tmp/test/hello.txt
As we can see, the awk command produces the expected output, too. Here, “NR == 1 || /[v]im/ ” is a boolean expression. awk evaluates this expression on each input line.
When an input line’s line number is “1” or its content matches the defined Regex, awk will execute the default action: print. Therefore, we get the desired output.
5. Don’t Pipe ps Output to { head -1; grep ‘pattern’; }
So far, we’ve seen the sed and awk solutions. We know that ps -ef | grep won’t work just because grep cannot output the first line. Also, the head command can easily print the first n lines. Can we simply add head -1 to print the first line and leave the rest lines to grep?
5.1. The ps -ef | { head -1; grep ‘pattern’; } Approach
Probably, we’ve seen someone use the ps -ef | { head -1; grep ‘pattern’; } approach in practice. Let’s test it with our example:
$ ps -ef | { head -1; grep '[v]im'; }
UID PID PPID C STIME TTY TIME CMD
kent 774919 44545 0 00:01 pts/7 00:00:01 vim /home/kent/.vimrc
kent 775664 1 0 00:04 ? 00:00:00 gvim /tmp/test/hello.txt
As we can see, this way works for our example. However, this solution is not reliable. We shouldn’t use this approach.
Next, let’s understand what problem this approach has.
5.2. Pipe to Group Command
First of all, let’s have a look at “*{ head -1; grep ‘[v]im’; }*“. Here, we group the two commands within {…}.
When we pipe to a Bash command group like Cmd | { Cmd1; Cmd2; }, naturally, we think that Cmd‘s output will turn into Stdin of each command in the command group.
However, piping to a command group doesn’t work in this way. If we pipe some data to a command group, all commands in the group share the same Stdin.
For example, in the command ps -ef | { head -1; grep ‘[v]im’}, *after the piping of ps‘s output to the command group, first, head -1 will consume some data from Stdin. Then, the grep command will consume the rest of the data from the Stdin.* Therefore, if the previous commands have consumed all data from Stdin, the later commands will work with an empty input. An example can explain it quickly:
$ seq 10 | { wc -l; grep '2'}
10
$ echo ?
1
As the example above shows, the first command in the command group is the wc command with the -l option to report the total number of lines. It will consume all data from Stdin to calculate how many lines we have. After wc prints “10”*,* Stdin is empty. Of course, the later grep ‘2’ command cannot find any matched line. Therefore, if we check the exit code of the whole command, it’s 1 instead of 0.
Now that we understand how pipe works with Bash’s command group, some may ask further: If head -1 consumes only the first line, which is the header line of ps‘s output, then the later grep command will process all the rest – this is indeed what we want – so, why we shouldn’t use ps -ef | { head -1; grep ‘[v]im’; } ?
To answer this question, we need to understand how head reads input.
5.3. How Does the head Command Read Inputs
Even though head -1 prints only the first line, it doesn’t mean head only reads until the first linebreak. The head command always reads bytes using a predefined buffer size. If there is no linebreak in the first buffer block – for example, if the first line is very long – head keeps reading later blocks in the defined size until it finds the first linebreak.
On the other hand, if the first line is very short, head -1 reads the first block anyway, which means it consumes the first line and other data after the first linebreak.
Next, let’s understand this through an example:
$ seq 10 | { head -1; grep '2'; }
1
$ echo $?
1
As we can see, head -1 prints “1“. However, grep ‘2’ doesn’t match anything. Further, $? reports 1. It also tells us that the grep command hasn’t found any matches.
Actually, we can change grep ‘2’ to grep ‘3’, grep ‘4’, or even grep ’10’, and we’ll have the same result. This is because the prior head -1 has consumed the entire output of the seq command.
Therefore, the approach ps -ef | {head -1; grep ‘pattern’; } is not reliable. We shouldn’t use it.
5.4. The head Command’s Buffer Size
Finally, let’s find out the buffer size of head‘s read operation. This buffer size (BUFSIZ) is defined in the glibc library’s stdio.h. In the latest version, 8192 bytes is defined as the buffer size.
We can also easily verify this value.
First, let’s create some data greater than 8k bytes. For simplicity, we’ll still use the seq command to generate the data:
$ seq 2048 | wc -c
9133
As we can see, seq 2048 will produce the output in 9133 bytes. Next, let’s see how many bytes are left in Stdin after head -1 reads from seq 2048:
$ seq 2048 | { head -1 > /dev/null; wc -c;}
941
Since we don’t need head -1‘s output, we redirect it to /dev/null. As we can see, after head reads from the 9133 bytes, we still have 941 bytes. Therefore, the head command’s read buffer size is 9133 – 941 = 8192 bytes.
When we use ps -ef | { head -1; grep ‘pattern’; }, we cannot predict if the process entries that we’re looking for are located in the first 8192 bytes of the ps command’s output, so this approach is not stable.
6. Conclusion
In this article, we’ve explored how to filter ps‘s output and preserve the header line using sed and awk.
Moreover, we’ve learned how pipes work with Bash’s command group. Also, we’ve discussed why we shouldn’t use the ps | {head -1; grep ‘pattern’; } approach.