1. Introduction

In this tutorial, we’ll focus on filtering the output of a disk space usage command based on the available space. This can be useful in situations when we only need to display filesystems that are only partially filled or have a certain percentage of free space.

We’ll first look at the awk command, which is considered one of the best tools for numeric output comparisons in Linux. Then, we’ll discuss the grep command, which is a popular tool for text filtering.

2. Initial df Output

Before applying the space usage filters, let’s first examine the initial df command, which we’ll then filter:

$ df
Filesystem     1K-blocks      Used Available Use% Mounted on
tmpfs            1627020      2412   1624608   1% /run
/dev/sda1      114792976  63567940  45347668  59% /
/dev/sdb1      960302096 282964952 628482720  32% /home
tmpfs            8135088      3556   8131532   1% /dev/shm
...

Here, we can see the list of all filesystems and their space usage percentage in the fifth column.

3. Using the awk Command

Now, let’s apply the filter to display only the filesystems with usage greater than 50%. For that, we’ll use awk.

The awk command is a powerful tool for text processing, based on the AWK programing language. It provides a handy way to filter Linux command output.

First, let’s apply a version of an awk solution to our problem:

$ df | awk '0+$5 >= 50'
/dev/sda1      114792976  63567940  45347668  59% /

Let’s look at this command in more detail:

  • df | pipes the output of the df command
  • awk ‘0+$5 >= 50’ applies an awk numeric comparison, where 0+$5 converts the fifth whitespace-based output argument (i.e., space usage percentage) of each row to a number, which is then compared to 50%

In the end, we’re left with the lines, where the usage is greater than 50%. In our case, it’s the /dev/sda1 root filesystem, which is at 59%.

If, for instance, we need to compare to a different number, we can replace 50 in our original command with the number we need.

Furthermore, we can change the comparison operator >= as well. This allows us to display the lines with filesystems that have more space available (less space used).

4. Using the grep Command

Another option is to filter the df output using the grep command. This may be a little trickier than the awk command because grep isn’t specialized in numeric comparisons.

To display the lines with disk usage higher than 50%, we’ll use the -E switch:

$ df | grep -E "([5-9][0-9]|100)%"
/dev/sda1      114792976  63567944  45347664  59% /

Let’s see what this command does:

  • df | pipes the output of the df command
  • grep -E uses extended regular expression patterns to allow us to use the | operator as a logical OR
  • [5-9][0-9] matches all numbers starting with 5-9, and ending with 0-9, e.g., 50-99
  • |100 includes 100 in the search pattern

We can see that the resulting output is the same as our previous example with awk.

If, for instance, we need to display disk usage higher than 20%, we may replace 5 with 2 in our command (because 20 starts with 2):

$ df | grep -E "([2-9][0-9]|100)%"
/dev/sda1      114792976  63567928  45347680  59% /
/dev/sdb1      960302096 282965252 628482420  32% /home

Now, we get the two filesystems in the output. Each of them takes more than 20% of our disk space.

There is a limitation to this method though. We can only compare our output to the numbers that can be divided by 10 (e.g., 10, 20, 30, etc.). Otherwise, the regex pattern might become more complex, and we’ll need to create a unique pattern for each number.

5. Conclusion

In this article, we learned the ways to filter the df command output based on disk space usage criteria.

We looked at the awk and grep commands. We also discussed possible limitations of the grep regex pattern.