1. Overview
When we pipe several commands in the shell, the shell spawns a process for each command. Sometimes, there might be some confusion as to the lifetimes of these processes and their execution order.
In this tutorial, we’ll discuss in which order piped processes run and their lifetimes.
2. General Information About Pipes
Pipes are one of the oldest inter-process communication (IPC) mechanisms in Linux. Despite some limitations, they’re the most widely used form of IPC.
One limitation is that we can only use pipes between processes that have a common ancestor. Normally, a process creates a pipe and then forks another process. The parent process and the child process use the pipe for communication.
Pipes are half-duplex: Data flows in only one direction. This is another limitation of pipes.
When we connect several commands with pipes in a shell, we create a pipeline. The operating system directs the standard output of each process in the pipeline to the standard input of the next process in the pipeline. In this case, the shell is the ancestor of each process in the pipeline.
Let’s consider an example pipeline:
$ ps -ef | grep tail | grep -v grep
Here, the shell directs the standard output of the first process spawned by running ps -ef to the standard input of the second process spawned by running grep tail. Similarly, it directs the standard output of the second process spawned to the standard input of the third process spawned by running grep -v grep.
Sometimes, there might be a misunderstanding about the lifetimes of the processes in a pipeline. We may think that the first process runs and exits, then its buffered output is fed to the second process. But this isn’t the case. The shell starts the second process while the first process is still running. Therefore, the processes run concurrently.
In fact, the operating system may not run the commands in a pipeline starting from left to right. The execution order of the commands depends on the scheduling policy of the operating system. The order isn’t guaranteed and may change from one execution to another.
3. Examples
We’ll look at examples of the concurrency and order of execution of piped processes in this section.
First, we’ll run examples that prove the concurrent operation of the processes in a pipeline. Then, we’ll run an example that shows in which order the operating system spawns piped processes.
3.1. Concurrent Operation of Piped Processes
Let’s check whether a tail command is running:
$ ps -ef | grep tail
alice 12330 20206 0 06:52 pts/5 00:00:00 grep –color=auto tail
First, we used the ps -ef command to list all processes. The -e option lists all processes. The -f option performs a full-format listing with additional columns. It also prints the command arguments.
We filtered the output of ps -ef by directing its output to the input of the grep tail command using a pipe.
There’s no tail command running, according to the output. However, the second command in the pipeline, grep tail, was listed in the output. If the operating system had spawned the process corresponding to grep tail after the process of ps –ef had ended, we wouldn’t have observed it in the output. Therefore, both processes seem to run concurrently.
As an additional example that proves the concurrent operation of piped processes, we’ll use another pipeline:
$ tail -f /dev/null | grep Hello
In this example, we ran the tail -f /dev/null command to follow the new logs appended to the file /dev/null, and we searched for the word Hello in the logs using grep Hello. Of course, nothing will be appended to /dev/null as it’s a null device file.
On another terminal, let’s check whether there’s a tail command running:
$ ps -ef | grep tail | grep -v grep
alice 12419 20206 0 06:53 pts/5 00:00:00 tail –f /dev/null
There’s a tail command running as expected. This is the first command in the pipeline. Its PID is 12419.
Now, we’ll check whether there’s a grep command running:
$ ps -ef | grep grep
alice 12420 20206 0 06:53 pts/5 00:00:00 grep –color=auto Hello
alice 12466 3797 0 06:53 pts/26 00:00:00 grep –color=auto grep
There are two grep commands running. The first grep in the output is the second command in the pipeline, grep Hello. Its PID is 12420. That means the operating system spawned it just after running the tail -f /dev/null command with PID 12419. The other grep command running was the second part of the pipeline ps -ef | grep grep.
Therefore, we see that both commands in the pipeline are running concurrently using the ps command.
3.2. Execution Order of Piped Processes
Having seen that the piped processes run concurrently, we’ll discuss in which order the operating system spawns the piped processes.
We’ll use the following script, pipe_order.sh:
#!/bin/bash
set -x
ps -ef | grep dbus | grep -v grep | awk '{print $2 $3}' | column -t | wc -l
The set -x command in this script lets us print commands and their arguments in a sequential way as they’re executed. It’s generally useful for debugging shell scripts.
Then, there is a long chain of piped commands, ps -ef | grep dbus | grep -v grep | awk ‘{print $2 $3}’ | column -t | wc -l. What this pipeline does isn’t what’s important here. The important point is the number of piped commands. There are six piped commands in this pipeline. We’ll examine in which order the operating system runs these six commands thanks to the set -x command.
Now, let’s run the script:
$ pipe_order.sh
+ ps -ef
+ grep dbus
+ grep -v grep
+ awk '{print $2 $3}'
+ column -t
+ wc -l
30
The output shows the commands in their execution order. It seems that the operating system executed the commands in the pipeline starting from left to right in the order in the pipeline. The number 30 in the last row is the number of found processes.
Now, we’ll rerun the script a few more times until we get a different result:
$ pipe_order.sh
+ ps -ef
+ grep dbus
+ grep -v grep
+ column -t
+ awk '{print $2 $3}'
+ wc -l
30
This output differs from the previous output. In this case, the operating system ran column -t before awk ‘{print $2 $3}’. Therefore, the commands in a pipeline may not always be spawned in the same order as they appear in the pipeline.
4. Conclusion
In this article, we discussed the concurrent execution and the execution order of processes in a pipeline.
First, we learned about pipes. Despite some limitations, pipes are one of the most widely used IPC mechanisms. We discussed that processes in a pipeline run concurrently. We also saw that the operating system may not run the commands in their specified order in the pipeline. The order depends on the scheduling by the operating system.
Then, we looked at two examples that showed the concurrent operation of piped processes and their execution order. We saw that the order may not always be the same in each execution.