1. Overview

In this article, we’ll talk about the parallel command. This tool will, as the name indicates, help us run several programs in parallel. With it, we’ll optimize the resources that our system has and speed up computations.

2. The parallel Command

There are several ways for us to run processes in parallel in Linux. The most popular ones are using & and parallel. There’s some overhead when using parallel over &, but we get more control over the jobs and output handling. This is especially relevant when each process outputs data so that it’s not mixed between them.

Moreover, parallel is more efficient when there are many more jobs than cores. On the one hand, using & will dispatch all the jobs at once, putting more than one job per core simultaneously, which may negatively impact performance. On the other hand, parallel will launch one job per core, only once the core is available, keeping everything more efficient. What’s more, parallel allows dispatching jobs through different servers in a network or other hosts.

We may have different parallel versions. This is the case for Debian-based distributions, whose package moreutils provides a parallel command with different behavior. In this guide, we’ll focus on GNU parallel.

One disclaimer for most of the commands shown in this guide is that the output obtained in different machines (and even the same machine in different runs) may differ.

Before starting to look at how to use parallel, it’s worth introducing parallel_tutorial. This application covers more content than this guide and helps us discover more options in case we need them.

2.1. Arguments at the End of the Command Line

We’re aiming to run a given command (potentially with some fixed_arguments) and a varying argument that will change for each process in parallel. There are mainly two syntaxes we can use with parallel. The first one is:

parallel [parallel_options] command [command_fixed_arguments] ::: command_variable_arguments

Let’s use the echo command to see how this works:

$ parallel echo ::: 1 2 3
1
2
3

We’re running three trivial jobs in parallel. Each one will print each one of the three numbers in the set we provide after :::.

We aren’t restricted to a single set of arguments, since we can chain several sets of arguments by appending ::: several times:

$ parallel echo ::: 1 2 ::: A B
1 A
1 B
2 A
2 B

The command parallel will do all the possible combinations of all the variable arguments.

Finally, let’s show how we can provide arguments that are fixed and shared to all the runs of the command:

$ parallel echo -n ::: 1 2 ::: A B
1 A1 B2 A2 B

Note that the -n argument for the echo command means that it won’t add a new line at the end. Thus, we end up with a single line in the output.

2.2. Piping Arguments

The second syntax for parallel is by providing arguments from piping the output of another command into parallel:

command_generating_variable_arguments | parallel [parallel_options] command [command_fixed_arguments]

Let’s use the seq command to generate a list of integers that we pipe to parallel:

$ seq 1 3 | parallel echo
1
2
3

We’ve generated a list of variable arguments (1 2 3) that parallel has passed to each one of the three echo calls.

As before, adding fixed arguments to the command is straightforward:

$ seq 1 3 | parallel echo -n
123

We’ve requested that echo doesn’t put a newline after every line with -n.

The main drawback of piping the arguments for each command call is that passing several variable arguments is more intricate. Depending on the command output, we can rely on parallel flags such as –colsep to specify a column separator, but this flag is outside the scope of this introductory guide.

3. Advanced Usage of parallel

We’ve covered the command structure and uses of parallel. However, there are still many missing pieces to get the best that the command has to offer.

3.1. Chaining Several Commands

In real life, we rarely call a single command but chain several command calls. We can certainly chain several commands using the ; operator. For example, let’s call date and sleep commands three times:

$ parallel date +%r ';' sleep 2 ';' echo Job ::: 1 2 3
00:15:02 AM
Job 1
00:15:02 AM
Job 2
00:15:02 AM
Job 3

We’ve requested the locale time in 12-hour format with the +%r argument, followed by sleeping 2 seconds, and finished with an echo call. With this example, we can see parallel in action. All three commands are launched at the same moment (because the time reported by date is the same for all of them) and sleep happens after date.

One important consideration when chaining commands is that we need to escape the ; operator between quotes. Otherwise, the shell will think that we’re providing several commands, while we want all the commands wrapped between the parallel call and the ::: block.

3.2. Location of Arguments Throughout the Command

In all the previous examples, the command_variable_arguments are just appended at the end of each command*.* This means that parallel command :: arg1 arg2 gets expanded to command arg1 and command arg2, even when command comprises several command calls*.*

However, we can request that the position of the arguments be different in the command, or even that these arguments appear multiple times:

$ parallel echo sequencing {} ';' seq 1 {} ::: 1 2 3
sequencing 1
1
sequencing 2
1
2
sequencing 3
1
2
3

We’re calling three times echo and seq, providing each time the argument to both commands.

Moreover, we can specify the argument location when several of them are provided at the end of parallel:

$ parallel echo sequencing {1} to {2}';' seq {1} {2} ::: 1 2 ::: 3 4
sequencing 1 to 3
1
2
3
sequencing 1 to 4
1
2
3
4
sequencing 2 to 3
2
3
sequencing 2 to 4
2
3
4

In this case, we test all the combinations of both sets of arguments and we provide each one in specific locations in our command.

3.3. Checking the Run Beforehand

Lastly, we may want to check whether the commands we’re building are right. The –dry-run flag allows us to print what commands will parallel run without actually running them.

Let’s try this flag with the previous example:

$ parallel --dry-run echo sequencing {1} to {2}';' seq {1} {2} ::: 1 2 ::: 3 4
echo sequencing 1 to 3; seq 1 3
echo sequencing 1 to 4; seq 1 4
echo sequencing 2 to 3; seq 2 3
echo sequencing 2 to 4; seq 2 4

We see the four commands that parallel will run instead of the output of these commands, which was what we saw in the previous scenario.

4. Output and Execution Control

One advantage of parallel over launching parallel jobs with & is that we get more control over how the output of the jobs is shown and how the jobs are distributed.

4.1. Completion Order Versus Keeping Input Order

By default, parallel will display the output of each command progressively on screen as soon as each job is completed:

$ parallel sleep {}';' echo Job {} done ::: 5 4 3 1 2
Job 1 done
Job 3 done
Job 2 done
Job 4 done
Job 5 done

We’re requesting a sleep of different times and a print of a text with echo. Each line is printed as soon as the job is completed.

However, this can get confusing if we’ve got many jobs. The flag –keep-order will display the command outputs following the order of the arguments. The following output will always be the same regardless of the load each processor is handling or other system conditions:

$ parallel --keep-order sleep {}';' echo Job {} done ::: 5 4 3 1 2
Job 5 done
Job 4 done
Job 3 done
Job 1 done
Job 2 done

As opposed to the previous case, when using –keep-order, we’ll see all the outputs shown at the same time once all parallel jobs are completed.

4.2. Specifying and Understanding Job Slots

Another convenient control option is to specify the number of jobs launched in parallel. We can force a specific number of jobs with the –jobs flag followed by the number of jobs.

For example, let’s request a single job:

$ parallel --jobs 1 sleep {}';' echo Job {} done ::: 5 4 3 1 2 
Job 5 done
Job 4 done
Job 3 done
Job 1 done
Job 2 done

Using –jobs 1 is the same as running each one of the commands sequentially one after the other. Thus, once the command call with argument 5 is completed, it will run the command with argument 4.

On the opposite end of the spectrum, let’s request as many jobs as the arguments we’ve provided:

$ parallel --jobs 5 sleep {}';' echo Job {} done ::: 5 4 3 1 2 
Job 1 done
Job 2 done
Job 3 done
Job 4 done
Job 5 done

Here, the output looks different. We’re launching five jobs at the same time. The output shows that the job that finishes the earliest is that whose argument is 1 since it’s sleeping the smallest amount of time. The next command is the one with argument 2, then 3, and so on.

Finally, let’s see how the job slots work when we request two job slots but provide five arguments:

$ parallel --jobs 2 sleep {}';' echo Job {} done ::: 5 4 3 1 2 
Job 4 done
Job 5 done
Job 1 done
Job 3 done
Job 2 done

Here we’ve requested two job slots, let’s call them slots A and B. To start with, slot A gets assigned the commands with argument 5, and slot B the command with argument 4. Slot B will finish its job earlier and display its output, starting with the command with argument 3. Once the job with argument 5 finishes, slot A will begin with the command with argument 1. Now, slot A will finish earlier than slot B and grab the last remaining command with argument 2.

Let’s map this out with a table:

Job Slot A

Argument 5

Argument 1

Argument 2

Job Slot B

Argument 4

Argument 3

We can see that the output order matches the job ending for each job slot.

By default, parallel will try to use all available resources to split the jobs.

5. Conclusion

In this article, we’ve talked about the parallel command. We started discussing the structure of the command call and where to place the arguments depending on how we feed them.

Then, we discussed more advanced uses, such as specifying the location of these arguments.

Finally, we talked about the control of the output and the execution of the jobs.