在cron中将wget的输出重定向到/dev/null

1. Overview

GNU Wget is a standard tool for downloading data from a web server. However, we can also use it to detect downtime of a site by running it periodically in a cron job. Moreover, we aren’t interested in downloading the data when using wget for such a use case, so we may want to ignore it altogether.

In this tutorial, we’ll learn how to direct the output of the wget command to /dev/null in cron.

2. Understanding the Scenario

In this section, we’ll simulate the scenario that makes it necessary for us to direct the output of wget to /dev/null.

2.1. Repeated Execution of wget Command

Let’s start by repeatedly executing the wget command five times:

$ seq 5 | xargs -n1 wget https://www.google.com
--2023-04-09 12:07:14--  https://www.google.com/
Resolving www.google.com (www.google.com)... 142.250.193.196, 2404:6800:4002:81c::2004
Connecting to www.google.com (www.google.com)|142.250.193.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'index.html.6'

index.html.6                                           [ <=>                                                                                                             ]  16.14K  --.-KB/s    in 0.004s

2023-04-09 12:07:15 (4.07 MB/s) - 'index.html.6' saved [16526]

--2023-04-09 12:07:15--  http://1/
Resolving 1 (1)... 0.0.0.1
Connecting to 1 (1)|0.0.0.1|:80... failed: Connection refused.
FINISHED --2023-04-09 12:07:15--
Total wall clock time: 1.7s
Downloaded: 1 files, 16K in 0.004s (4.07 MB/s)
# output trimmed to a single execution

Now, we’ll find that a new file is generated for each execution of wget, each with a filename prefix of index.html:

$ ls -1 index.html*
index.html
index.html.1
index.html.2
index.html.3
index.html.4

Next, let’s see if we can redirect the output to /dev/null:

$ wget https://www.google.com &>/dev/null
# no output on stdout

We made progress, as there is no output on stdout. However, let’s also verify if there are any files generated:

$ ls -1 index.html*
index.html

Unfortunately, the redirection didn’t help prevent the creation of an output document. That’s because of the default behavior of the wget command that sends the output only to a file instead of stdout.

2.2. Running wget Using cron

First, let’s set up a cron job using the -e option of the crontab command:

$ crontab -e

This opens an editor for us where we can add the schedule for our cron job.

Now, let’s verify the schedule of our cron job by using the -l option of the crontab command:

$ crontab -l | grep -v -E 'wget' -E '^#'
* * * * * wget https://www.google.com

The output confirms that our cron job will run every minute. Moreover, we must note that we used the -v option of the grep command to exclude all the comments starting with the # character.

Next, let’s see the impact of the cron job after an hour by inspecting the number of files generated on our filesystem:

# ls index* | wc -l
60

As expected, the cron job created a new file every minute. Furthermore, we can anticipate that cron jobs using the wget command can put a lot of pressure on our filesystem in the longer run.

Finally, let’s also see the impact of the cron job on the incoming mail because cron jobs send an email to its owner if the job execution produces an output:

$ mail
"/var/mail/root": 60 messages 60 new
# output trimmed

We must realize that such an email is an unnecessary noise for the user. So, we must solve this issue by directing the output to /dev/null.

3. Using –output-document and –output-file Options

We can use the –output-document and –output-file options of the wget command to direct the output document and diagnostic information, respectively.

Let’s see this in action in a standalone run of the wget command by using /dev/null as the target for all redirections:

$ wget --output-document /dev/null --output-file /dev/null https://www.google.com
$ echo $?
0

As expected, we don’t see any output on stdout. Further, we’ve also checked that the execution of the command was successful.

Now, let’s also verify the presence of the output document:

$ [ -f index.html ] ; echo $?
1

Great! We’ve got it right this time, as the filesystem has no additional output document from the wget command.

Next, let’s use this learning to revise our cron job:

$ crontab -l | grep -v -E '^#'
* * * * * wget --output-document /dev/null -output-file /dev/null https://www.google.com

Finally, let’s wait for a minute to ensure that the cron job is not producing the output document anymore:

$ sleep 60; [ -f index.html ] ; echo $?
1

As expected, the exit status is non-zero, indicating that the file named index.html is absent.

4. Using –spider and –quiet Options

Interestingly, wget supports the –spider option to mimic the behavior of a web spider. Using this behavior, we can limit wget to only visiting the URL without downloading it.

Let’s analyze the behavior of wget with the –spider option:

$ wget --spider https://www.google.com
Spider mode enabled. Check if remote file exists.
--2023-04-09 14:11:21--  https://www.google.com/
Resolving www.google.com (www.google.com)... 142.250.192.196, 2404:6800:4002:817::2004
Connecting to www.google.com (www.google.com)|142.250.192.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.

Although there is some noise in the output, we can verify that wget didn’t download the file:

$ [ -f index.html ] ; echo $?
1

Next, let’s go ahead and use the –quiet option to suppress the diagnostic logs:

$ wget --quiet --spider https://www.google.com
$ echo $?
0

Great! We can use the –quiet and –spider options together to silence the output of the wget command in a cron job.

Finally, let’s revise the command in our cron job and verify the outcome:

$ crontab -e
# revise the wget command
crontab: installing new crontab

$ crontab -l | grep -v -E '^#'
* * * * * wget --quiet --spider https://www.google.com
$ sleep 60; [ -f index.html ] ; echo $?
1

As expected, there is no output document after waiting for a minute. So, we can infer that the approach works as expected.

5. Using Output Redirections

In this section, we’ll explore different scenarios involving output redirection while using the wget command.

5.1. Using &>/dev/null

We can use the –output-document option along with – to direct the output document to stdout. Additionally, we can use the &> output redirection to direct both stdout and stderr to /dev/null:

$ wget --output-document - https://www.google.com &>/dev/null
$ echo $?
0
$ [ -f index.html ] ; echo $?
1

Great! It looks like this output redirection strategy suppresses the output when using wget outside a cron job.

Now, let’s go ahead and verify if this works within a cron schedule:

$ crontab -l | grep -v -E '^#'
* * * * * wget --output-document - https://www.google.com &>/dev/null
$ sleep 60; [ -f index.html ] ; echo $?
1

The verification passed, so we have another strategy to solve our use case.

5.2. Using 1>/dev/null

Alternatively, we can use the –output-file and –output-document options together with – to direct the entire output to stdout. Then, we can use the 1>/dev/null output redirection to direct stdout to /dev/null:

$ wget --output-document - --output-file - https://www.google.com 1>/dev/null
# same verification as earlier

Moving on, let’s revise the crontab entry:

$ crontab -e
# revise wget command
$ crontab -l | grep -v -E '^#'
* * * * * wget --output-file - --output-document - https://www.google.com 1>/dev/null

Finally, let’s verify its functionality when running through the cron job:

$ sleep 60; [ -f index.html ] ; echo $?
1

Perfect! Our approach worked as expected.

6. Conclusion

In this article, we learned the significance of disabling the output of the wget command in cron. Furthermore, we explored different options of wget, such as –output-document, –output-file, and –spider, along with output redirection techniques to direct the output to /dev/null.

Persistence

REST

Security