1. Overview

wget is one of the most common tools used to download content from the Internet. Using wget, we can mirror websites, download files and media, etc., in a non-interactive, automated way. It is intelligent enough to automatically follow hyperlinks and HTTP redirects.

However, there might be situations where we do not want wget to follow redirects. In this tutorial, we’ll see the default behavior of wget to a server’s redirect response and how to change this behavior.

2. Using wget To Download Content

wget is quite simple to use. Let’s say we want to look at the license for the Linux kernel. We can fetch the LICENSE file from its git repo:

$ wget https://github.com/torvalds/linux/raw/master/LICENSES/preferred/GPL-2.0
--2022-02-03 20:39:21--  https://github.com/torvalds/linux/raw/master/LICENSES/preferred/GPL-2.0
SSL_INIT
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving github.com (github.com)... 13.234.210.38
Connecting to github.com (github.com)|13.234.210.38|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/torvalds/linux/master/LICENSES/preferred/GPL-2.0 [following]
--2022-02-03 20:39:22--  https://raw.githubusercontent.com/torvalds/linux/master/LICENSES/preferred/GPL-2.0
SSL_INIT
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 18729 (18K) [text/plain]
Saving to: ‘GPL-2.0’

GPL-2.0                          100%[==============================================================>]  18.29K  --.-KB/s    in 0.003s  

2022-02-03 20:39:23 (5.63 MB/s) - ‘GPL-2.0’ saved [18729/18729]

This downloads the content into a file named GPL-2.0 in our current directory. Of course, we can change the name of the downloaded file using the -O flag, if we want to.

Here, if we look closely at the output above, there is a line that says “HTTP request sent, awaiting response… 302 Found”. This means that the URL we used is resulting in an HTTP redirection to another URL.

The redirection URL is provided in the Location HTTP response header. In the example above, we can see the value of the Location header as well in the next line in the output.

3. Preventing Redirects When Using wget

wget, by default, follows redirects from a given URL. But there might be situations where we want to reduce the number of redirects or prevent redirects completely. For example, when mirroring a website, we might want to prevent links in the website from redirecting to external websites.

We can do this using the –max-redirect flag of wget. Its default value is 20. Hence, wget will follow up to 20 redirections for a URL. If we set it to 0, it stops following redirects.

Let’s try the previous example again with –max-redirect set to 0:

$ wget --max-redirect=0 https://github.com/torvalds/linux/raw/master/LICENSES/preferred/GPL-2.0
--2022-02-03 20:49:28--  https://github.com/torvalds/linux/raw/master/LICENSES/preferred/GPL-2.0
SSL_INIT
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving github.com (github.com)... 13.234.210.38
Connecting to github.com (github.com)|13.234.210.38|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/torvalds/linux/master/LICENSES/preferred/GPL-2.0 [following]
0 redirections exceeded.

Here, we can see that wget is reporting “0 redirections exceeded“. After that, it stops and doesn’t follow the URL in the Location header. Likewise, if we want to allow a specific number of redirections, we can set the value of –max-redirect to the desired value.

4. Conclusion

In this short article, we first saw that wget follows HTTP redirections by default, and then we looked at the flags we can use to prevent that.