1. Overview
We may wish to send HTTP requests without using a web browser or other interactive app. For this, Linux provides us with two commands: curl and wget.
Both commands are quite helpful as they provide a mechanism for non-interactive download and upload of data. We can use them for web crawling, automating scripts, testing of APIs, etc.
In this tutorial, we will be looking at the differences between these two utilities.
2. Protocols
2.1. Using the HTTP Protocol
Both curl and wget support HTTP, HTTPS, and FTP protocols. So if we want to get a page from a website, say baeldung.com, then we can run them with the web address as the parameter:
wget https://www.baeldung.com/
--2019-10-02 22:00:34-- https://www.baeldung.com/
Resolving www.baeldung.com (www.baeldung.com)... 2606:4700:30::6812:3e4e, 2606:4700:30::6812:3f4e, 104.18.63.78, ...
Connecting to www.baeldung.com (www.baeldung.com)|2606:4700:30::6812:3e4e|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’
index.html [ <=> ] 122.29K --.-KB/s in 0.08s
2019-10-02 22:00:35 (1.47 MB/s) - ‘index.html’ saved [125223]
The main difference between them is that curl will show the output in the console. On the other hand, wget will download it into a file.
We can save the data in a file with curl by using the -o parameter:
curl https://www.baeldung.com/ -o baeldung.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 122k 0 122k 0 0 99k 0 --:--:-- 0:00:01 --:--:-- 99k
2.2. Download and Upload Using FTP
We can also use curl and wget to download files using the FTP protocol:
wget --user=abhi --password='myPassword' ftp://abc.com/hello.pdf
curl -u abhi:myPassword 'ftp://abc.com/hello.pdf' -o hello.pdf
We can also upload files to an FTP server with curl. For this, we can use the -T parameter:
curl -T "img.png" ftp://ftp.example.com/upload/
We should note that when uploading to a directory, we must use provide the trailing /, otherwise curl will think that the path represents a file.
2.3. Differences
The difference between the two is that curl supports a plethora of other protocols. This includes DICT, FILE, FTPS, GOPHER, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMB, SMBS, SMTP, SMTPS, TELNET, and TFTP.
We can treat curl as a general-purpose tool for transferring data to or from a server.
On the other hand, wget is basically a network downloader.
3. Recursive Download
When we wish to make a local copy of a website, wget is the tool to use. curl does not provide recursive download, as it cannot be provided for all its supported protocols.
We can download a website with wget in a single command:
wget --recursive https://www.baeldung.com
This will download the homepage and any resources linked from it. As we can see, www.baeldung.com links to various other resources like:
- Start here
- REST with Spring course
- Learn Spring Security course
- Learn Spring course
wget will follow each of these resources and download them individually:
--2019-10-02 22:09:17-- https://www.baeldung.com/start-here
...
Saving to: ‘www.baeldung.com/start-here’
www.baeldung.com/start-here [ <=> ] 134.85K 321KB/s in 0.4s
2019-10-02 22:09:18 (321 KB/s) - ‘www.baeldung.com/start-here’ saved [138087]
--2019-10-02 22:09:18-- https://www.baeldung.com/rest-with-spring-course
...
Saving to: ‘www.baeldung.com/rest-with-spring-course’
www.baeldung.com/rest-with-spring-cou [ <=> ] 244.77K 395KB/s in 0.6s
2019-10-02 22:09:19 (395 KB/s) - ‘www.baeldung.com/rest-with-spring-course’ saved [250646]
... more output omitted
3.1. Recursive Download with HTTP
The recursive download is one of the most powerful features of wget. This means that wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site.
Recursive downloading in wget is breadth-first. In other words, it first downloads the requested document, then the documents linked from that document, then the documents linked by those documents, and so on. The default maximum depth is set to five, but it can be overridden using the -l parameter:
wget ‐l=1 ‐‐recursive ‐‐no-parent http://example.com
In the case of HTTP or HTTPS URLs, wget scans and parses the HTML or CSS. Then, it retrieves the files the document refers to, through markups like href or src.
By default, wget will exclude paths under robots.txt (Robot Exclusion Standard). To switch this off, we can use the -e parameter:
wget -e robots=off http://example.com
3.2. Recursive Download with FTP
Unlike HTTP recursion, FTP recursion is performed depth-first. This means that wget will retrieve data of the first directory up to the specified depth level, and then move to the next directory in the directory tree.
4. Conclusion
In this article, we saw how both curl and wget can download files from internet servers.
wget is a simpler solution and only supports a small number of protocols. It is very good for downloading files and can download directory structures recursively.
We also saw how curl supports a much larger range of protocols, making it a more general-purpose tool.