Linux wget 命令指南及示例

1. Overview

The wget command, a Linux utility, is a crucial tool for downloading files from the web. Its extensive features and options make it versatile for fetching content from URLs (Universal Resource Locators).

In this guide, we’ll delve into the various functionalities of the wget command, covering basic usage and more advanced features.

2. Common wget Command Options

For the start, we examine the simple basic syntax of the wget command:

wget [options] [URL]

Let’s break down the components of the wget command:

[options]: These represent the various command-line flags available with wget.
[URL]: This represents the URL from which to download the file.

Conversely, the command defaults to downloading the specified URL and saves the downloaded file when used without options. So, the syntax looks like this:

wget [URL]

Now, let’s explore some common options associated with the wget command:

Options

Description

-P, –directory-prefix=PREFIX

Specifies the directory where the downloaded file will be saved.

-O, –output-document-FILE

Specifies the name of the downloaded file.

-r, –recursive

Enables recursive downloading, which is useful for downloading entire websites.

-np, –no-parent

Restricts downloading to the specified directory, preventing retrieval of files from parent directories.

-c, –continue

Resumes a partially downloaded file.

-q, –-quiet

Suppresses output, making wget operate in quiet mode.

In the above table, the options allow us to customize the behavior of wget according to our requirements.

3. Common wget Command Examples

Let’s dive into the practical examples of using the wget command.

3.1. Download a Single File

The most basic use of the wget command is to download a single file from a URL.

For example, let’s download a file named requirement.txt from a GitHub repository:

$ wget https://github.com/Abwonder/diabetesprediction/blob/main/requirements.txt
--2024-05-04 10:54:50--  https://github.com/Abwonder/diabetesprediction/blob/main/requirements.txt
Resolving github.com (github.com)... 140.82.121.3
Connecting to github.com (github.com)|140.82.121.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘requirements.txt’

requirements.txt                         [    <=>                                                              ] 144.66K   131KB/s    in 1.1s    

2024-05-04 10:54:52 (131 KB/s) - ‘requirements.txt’ saved [148135]

Note that the URL points the wget command to the file to download, which is the endpoint on the URL. In this case, requirements.txt is the file on the endpoint of the URL.

The file downloaded is named requirements.txt and is in the current folder where it was downloaded.

3.2. Save Downloaded File to a Specific Directory

Proceeding from the previous example, we can now include the option to specify the directory where the downloaded file should be saved using the -P option.

For instance, we repeat the previous download and save it to the download directory:

$ wget -P ~/Downloads https://github.com/Abwonder/diabetesprediction/blob/main/requirements.txt

--2024-05-04 11:11:29--  https://github.com/Abwonder/diabetesprediction/blob/main/requirements.txt
.........
2024-05-04 11:11:30 (293 KB/s) - ‘/home/kali/Downloads/requirements.txt’ saved [148132]

In the code above, the -P directs wget command to save the file it’s downloading to Downloads. The -P command requires adding the pathway to the directory; it should point the command where to save the file.

More so, we observed that the downloaded file is saved to /home/kali/Downloads/requirements.txt.

3.3. Rename Downloaded File

The -O option, in combination with the wget command, renames the downloaded file to what’s specified within the code. Subsequently, let’s examine the use of this option:

$ wget -O newbaeldung.txt https://github.com/Abwonder/diabetesprediction/blob/main/requirements.txt
..........
2024-05-04 12:09:01 (283 KB/s) - ‘newbaeldung.txt’ saved [148125]

Now, the output showed that the file was downloaded and saved as newbaeldung.txt, and the size of the file was 148125 bytes.

3.4. Download an Entire Website

The –recursive –no-parent option gives the wget command the capability to download an entire website for offline viewing recursively. So, we can proceed to try this command out on www.skrill.com website:

$ wget --recursive --no-parent https://www.skrill.com/en/
.........
2024-05-04 12:42:50 (60.7 KB/s) - ‘www.skrill.com/en/index.html’ saved [29495/29495]

Loading robots.txt; please ignore errors.
.........
2024-05-04 12:43:23 (35.6 MB/s) - ‘www.skrill.com/robots.txt’ saved [930/930]

--2024-05-04 12:43:23--  https://www.skrill.com/en/business/
.........
2024-05-04 12:44:13 (324 KB/s) - ‘www.skrill.com/en/business/index.html’ saved [23662/23662]

--2024-05-04 12:44:13--  https://www.skrill.com/en/support/
............
Saving to: ‘www.skrill.com/en/support/index.html’

Based on the result of the command, it shows that the option triggered the command to continuously reuse the URL to download all the pages on the website.

In addition, the command used the default naming convention for naming files exactly as the original file name on the website.

3.5. Resuming Partially Downloaded File

The -c option allows the wget command to resume file downloading that was initially interrupted. For example, let’s interrupt wget current download using Ctrl + C:

$ wget https://github.com/Abwonder/LinearRegression4
............
Saving to: ‘LinearRegression4’

LinearRegression4                         [ <=>                                                                      ]  36.01K   145KB/s         ^C

We used Ctrl+C to interrupt the code above, allowing us to use the -c option with the wget command to resume the downloading process. Now, let’s take a look at how this option works:

$ wget -c https://github.com/Abwonder/LinearRegression4
............
Saving to: ‘LinearRegression4’

LinearRegression4

In the results, the first section shows the downloading that was interrupted with Ctrl+C key combinations, while the second result shows that the downloading was resumed.

In addition, this option empowers the wget command to verify if the file slated for download already resides in the current directory and to resume the download process if it remains incomplete.

4. Advanced wget Command Examples

Essentially, advanced wget commands further extend its capabilities. So, it’s an added advantage to have knowledge of their usage.

Let’s explore more advanced usage scenarios of the wget command.

4.1. Limit Download Speed

The wget command controls download speed by utilizing the rate-limiting option –limit-rate.

For instance, to limit download speed to 100 KB/s:

$ wget --limit-rate=100k www.skrill.com

So, by adjusting the number specified in the options, we can alter the download speed rate in kilobytes per second. For instance, in the code provided, the rate is set to 100k/s.

4.2. Download Files from a List

The wget command can read and download URLs from a file. For this example, let’s combine all the URLs we have referenced previously to create a list named urls.txt using the nano text editor:

www.skrill.com/en/index.html
https://www.skrill.com/en/
https://github.com/Abwonder/LinearRegression4
www.skrill.com
www.example.com

Next, we can proceed to apply the -i option to the wget command:

$ wget -i urls.txt

Now, in the code above, the -i option instructs the wget command to iterate over each of the URLs in the list and download them consecutively. Therefore, the wget command downloads files from the URLs one after the other into the current active folder on our host system.

5. Conclusion

In this article, we extensively discuss the usage of the wget command, including examples and outputs for better understanding. Moreover, it’s important to note that the wget command is a versatile tool for downloading files from the web on Linux.

Therefore, understanding its various options and capabilities allows for efficient file retrieval and management. Moreover, by mastering wget, we can streamline the process of downloading files and automate tasks effectively.

Persistence

REST

Security