1. Introduction
In this tutorial, we’ll focus on filtering the output of a disk space usage command based on the available space. This can be useful in situations when we only need to display filesystems that are only partially filled or have a certain percentage of free space.
We’ll first look at the awk command, which is considered one of the best tools for numeric output comparisons in Linux. Then, we’ll discuss the grep command, which is a popular tool for text filtering.
2. Initial df Output
Before applying the space usage filters, let’s first examine the initial df command, which we’ll then filter:
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 1627020 2412 1624608 1% /run
/dev/sda1 114792976 63567940 45347668 59% /
/dev/sdb1 960302096 282964952 628482720 32% /home
tmpfs 8135088 3556 8131532 1% /dev/shm
...
Here, we can see the list of all filesystems and their space usage percentage in the fifth column.
3. Using the awk Command
Now, let’s apply the filter to display only the filesystems with usage greater than 50%. For that, we’ll use awk.
The awk command is a powerful tool for text processing, based on the AWK programing language. It provides a handy way to filter Linux command output.
First, let’s apply a version of an awk solution to our problem:
$ df | awk '0+$5 >= 50'
/dev/sda1 114792976 63567940 45347668 59% /
Let’s look at this command in more detail:
- df | pipes the output of the df command
- awk ‘0+$5 >= 50’ applies an awk numeric comparison, where 0+$5 converts the fifth whitespace-based output argument (i.e., space usage percentage) of each row to a number, which is then compared to 50%
In the end, we’re left with the lines, where the usage is greater than 50%. In our case, it’s the /dev/sda1 root filesystem, which is at 59%.
If, for instance, we need to compare to a different number, we can replace 50 in our original command with the number we need.
Furthermore, we can change the comparison operator >= as well. This allows us to display the lines with filesystems that have more space available (less space used).
4. Using the grep Command
Another option is to filter the df output using the grep command. This may be a little trickier than the awk command because grep isn’t specialized in numeric comparisons.
To display the lines with disk usage higher than 50%, we’ll use the -E switch:
$ df | grep -E "([5-9][0-9]|100)%"
/dev/sda1 114792976 63567944 45347664 59% /
Let’s see what this command does:
- df | pipes the output of the df command
- grep -E uses extended regular expression patterns to allow us to use the | operator as a logical OR
- [5-9][0-9] matches all numbers starting with 5-9, and ending with 0-9, e.g., 50-99
- |100 includes 100 in the search pattern
We can see that the resulting output is the same as our previous example with awk.
If, for instance, we need to display disk usage higher than 20%, we may replace 5 with 2 in our command (because 20 starts with 2):
$ df | grep -E "([2-9][0-9]|100)%"
/dev/sda1 114792976 63567928 45347680 59% /
/dev/sdb1 960302096 282965252 628482420 32% /home
Now, we get the two filesystems in the output. Each of them takes more than 20% of our disk space.
There is a limitation to this method though. We can only compare our output to the numbers that can be divided by 10 (e.g., 10, 20, 30, etc.). Otherwise, the regex pattern might become more complex, and we’ll need to create a unique pattern for each number.
5. Conclusion
In this article, we learned the ways to filter the df command output based on disk space usage criteria.
We looked at the awk and grep commands. We also discussed possible limitations of the grep regex pattern.