1. Overview
In this tutorial, we’ll use a simple tool wget to download multiple files in parallel.
The commands used in this article were tested in bash, but should work in other POSIX compliant shells as well.
2. Downloading Files with wget
Downloading files with wget is fairly straightforward:
wget https://my.website.com/archive.zip
Unfortunately, we can only download one file at a time.
We have to resort to shell scripting to download multiple files in a single command:
#!/bin/bash
while read file; do
wget ${file}
done < files.txt
Here, files.txt contains all files that have to be downloaded, each on its own line:
https://my.website.com/archive-1.zip
https://my.website.com/archive-2.zip
https://my.website.com/archive-3.zip
The problem with this approach, however, is that the files are downloaded sequentially. We might speed things up by downloading files in parallel.
3. Parallelizing Downloads with wget
There are different ways in which we can make wget download files in parallel.
3.1. The Bash Approach
A simple and somewhat naive approach would be to send the wget process to the background using the *&*-operator:
#!/bin/bash
while read file; do
wget ${file} &
done < files.txt
Each call to wget is forked to the background and runs asynchronously in its own separate sub-shell.
Although we now download the files in parallel, this approach is not without its drawbacks. For example, there is no feedback on completed or failed downloads. Also, we can’t control how many processes will be executed at once.
3.2. Let wget Fork Itself
We can do a little better and let wget fork itself to the background by passing -b as a parameter:
#!/bin/bash
while read file; do
wget ${file} -b
done < files.txt
Just as with the *&*-operator, each call is forked to the background and run asynchronously. What is different though, is that the -b parameter additionally makes for us a log file for each download. We can grep these log files to check that no errors occurred.
3.3. Using xargs
The most sophisticated and clean solution to our problem is using xargs. The xargs command takes a list of arguments and passes these to a utility of choice with the possibility to run multiple processes in parallel.
Above all, it gives us control over the maximum number of processes that will run at once at any given time.
For example, we can call wget for each line in files.txt with a maximum of two processes in parallel:
#!/bin/bash
cat files.txt | xargs -n 1 -P 2 wget -q
We also set wget to be quiet (-q). Without that, xargs would redirect the output of all processes to stdout, which would have cluttered our terminal in no time. Instead, we can rely on xargs‘s return code. It’ll exit with a value of 0 if no error has occurred and with a value of 1 otherwise.
4. Conclusion
As we have seen, there are different ways in which we can download multiple files in parallel using wget.
The xargs command also provides the cleanest solution to this problem; it’s quite useful in scripts because it offers the right amount of control and has a clean exit code.