如何在Linux中不使用curl或wget下载文件

1. Overview

In this tutorial, we’ll be learning about how to download a file from a URL without using curl or wget.

We will start by going over the methods used, then we will create a script to automate this process.

2. Downloading a File Using the Command Line

The first step in downloading a file from the command line is making a web request i.e. HTTP. In this section, we will go over making a web request with HTTP using GNU Bash then we will go over making an HTTPS request using OpenSSL.

2.1. HTTP Request

To make an HTTP request, we can use /dev/tcp with bash:

$ exec {NFD}<>"/dev/tcp/www.example.com/80"

To create a TCP connection, we can use the pseudo file: /dev/tcp/${host}/${port_number}. We open this file for reading and writing on the named file descriptor $NFD. We will use this file descriptor for sending and receiving data.

The port number used for an HTTP request is port 80 and the host is the base of our URL, www.example.com, so the file we open is /dev/tcp/www.example.com/80.

Now, let’s create our HTTP request:

$ HTTP_REQUEST="$({
    echo -e -n 'GET /test123 HTTP/1.1\r\n'
    echo -e -n 'Host: www.example.com\r\n'
    echo -e -n 'Connection: close\r\n\r\n'
})"

In this example, we first create our HTTP request and assign it to the variable $HTTP_REQUEST. The first HTTP header will contain our sub-URL (/test123). The next HTTP header will contain the host (www.example.com). The final HTTP header will contain Connection: close to indicate we want to close the connection after this request is made.

Then to make our HTTP request, we just have to redirect this string to our file descriptor, $NFD:

$ echo "${HTTP_REQUEST}" >&"${NFD}"

Now that we have sent our HTTP request let’s print the response by reading from the file descriptor:

$ while read -u "${NFD}" lz ; do
    echo "${lz}"
done

To read a line from our file descriptor we can use the read with the -u flag and the file descriptor we want to read from. We use a while loop to read until there are no more lines to read from and print the contents of each line.

After we are done reading from our file descriptor we should close it:

$ exec {NFD}>&-

To close our file descriptor we just have to use exec with the file descriptor we want to close and redirect the output to &-.

2.2. HTTPS Request

An issue with /dev/tcp is its inability to handle SSL/TLS. Therefore we will have to find an alternative method for making an HTTP over TLS/SSL request (HTTPS). Luckily, the openssl s_client command is capable of making HTTPS requests.

Let’s begin by creating an HTTPS connection:

$ openssl s_client -quiet -connect www.example.com:443

We first run the openssl s_client command with the -quiet flag to avoid printing the certificate or session information. We use the -connect flag to specify we want to connect to the domain www.example.com and since we are using HTTPS we will be using port 443.

Now, let’s test out our HTTPS connection using the request we created in our previous example.

$ echo "${HTTP_REQUEST}" | openssl s_client -quiet -connect www.example.com:443

To send $HTTP_REQUEST to our HTTPS connection just have to pipe it to the openssl s_client command. The output of this command will be the HTTP response.

2.3. Parsing Response

Now that we know how to send an HTTP/HTTPS request we just need to parse the response into a file. To begin let’s assign the output of our HTTP request command to a variable:

$ response="$(while read -u "${NFD}" lz ; do echo "${lz}" done)"

The beginning of the response should be the HTTP headers and afterwards should be the body, which contains the contents of the file we want to save.

Now that we have our response as a variable, we can extract the relevant data to our file:

$ echo "${response#*$'\r\n\r\n'}" > output.html

In this example, we use parameter expansion to remove everything up until the string \r\n\r\n and output the resulting string to the file, output.html.

3. Creating Our Script

Finally, let’s put the previous steps together into a simple script:

#!/bin/bash

raw_download() {
    wPROTO="${1%://*}"
    af="${1#*://}"
    wBASE="${af%%/*}"
    wSUB="${af#*/}"

    HTTP_REQUEST="$({
        echo -en 'GET /'"${wSUB}"' HTTP/1.1\r\n'
        echo -en 'Host: '"${wBASE}"'\r\n'
        echo -en 'Connection: close\r\n\r\n'
    })"

    if [[ "${wPROTO,,}" = 'https' ]] ; then
        echo "${HTTP_REQUEST}" | openssl s_client -quiet -connect ${wBASE}:443
    else
        exec {NFD}<>"/dev/tcp/${wBASE}/80"
        echo "${HTTP_REQUEST}" >&"${NFD}"
        while read -u "${NFD}" lz; do
            echo "${lz}"
        done
        exec {wFD}>&-
    fi
}

main() {
    raw="$(raw_download "${1}" 2>errorlog.txt)"
    echo "${raw#*$'\r\n\r\n'}" > "${2}"
}

main "${@}"

In our script, we first declare the raw_download function, which we will use to make and get the raw response data. We detect the method used for making our web request by getting the protocol prefix from our URL (“http://” or “https://”).

Next, we declare the main function, which we will use to call raw_download and parse the resulting data into a file. Finally, we can execute our script:

$ chmod +x download_file
$ ./download_file 'https://www.baeldung.com/java-weekly-495' 'java_weekly_495.html'

We first make our script executable using chmod with the +x flag, then we run our script using ./download_file. The first argument is the URL of the file we want to download and the second argument is the location of the file we want to output to.

4. Conclusion

In this tutorial, we learned how to download a file from a URL without wget or curl. We then created a script that will automate this process.

Persistence

REST

Security