1. Overview
When we work under the Linux command line, deleting files is a standard operation.
Let’s imagine a typical scenario, where we have a directory containing a bunch of files with similar file names, and we want to delete some of them according to different requirements.
In this tutorial, we’re going to learn how to delete multiple files in one shot from the Linux command line.
2. Introduction to the Problem
First of all, we’ll create an example list of files.
Let’s say we have a directory called logs:
$ ls -1 logs
app.log.2020-06-15
app.log.2020-06-21
app.log.2020-06-22
app.log.2020-06-23
app.log.2020-06-24
app.log.2020-06-25
app.log.2020-07-15
app.log.2020-07-25
app.log.2020-07-26
app.log.2020-07-27
app.log.2020-07-28
app.log.2020-08-25
app.log.2020-08-26
app.log.2020-08-27
app.log.2020-08-28
app.log.2020-08-29
app.log.2020-08-30
app.log.2020-08-31
As the output above shows, we have many log files in the logs directory. Each filename ends with a date.
We’re going to discuss four different ways to delete multiple files efficiently:
- Use Bash’s brace expansion
- Use Bash’s glob
- Use the find command
- Use the awk command
3. Using Bash’s Brace Expansion
If we know the exact filenames we want to delete, and the filenames follow the same pattern, we may consider using the brace expansion to save a lot of typing and make the command compact and more readable.
Let’s say we want to remove the log files for 2020-08-25, 2020-08-27, 2020-08-30, and 2020-08-31 from the logs directory.
Of course, we can type the four filenames after the rm command:
$ rm logs/app.log.2020-08-25 logs/app.log.2020-08-27 logs/app.log.2020-08-30 logs/app.log.2020-08-31
Usually, our shell can auto-complete the filenames. It saves a lot of typing. However, selecting files from a long list of quite similar filenames is error-prone. Apart from that, the long command is not easy to read or check.
In this case, the brace expansion can not only save our typing but also make the command compact and readable:
$ rm logs/app.log.2020-08-{25,27,30,31}
However, the brace expansion won’t help us if we don’t know precisely the filenames we want to remove — for example: “Delete all log files for July 2020”.
If this is the case, we may consider using file globbing or a regular expression to match our target files.
4. Using Bash’s File Globbing
A glob is sometimes called a wildcard. In our everyday work, we use globs pretty often. For example, “*.java” means all Java source files.
If we want to delete all log files of the “app” application, we can execute:
$ rm logs/app.log.*
Or we can remove all log files from July 2020:
$ rm logs/app.log.2020-07-*
Using globbing, we can conveniently match multiple files with a name pattern.
Before we use file globbing with the rm command, it would be good to test the glob with the ls command to check if the list of the matched files is what we want to delete:
$ ls -1 logs/app.log.2020-07-*
logs/app.log.2020-07-15
logs/app.log.2020-07-25
logs/app.log.2020-07-26
logs/app.log.2020-07-27
logs/app.log.2020-07-28
5. Using the find Command
File globbing is a convenient way to match multiple files. However, regular expressions (regex) are more powerful in terms of pattern matching.
5.1. Find the Files to Delete
Let’s say we have a new requirement: we check the log files of June and July. If the day of the month contains the digit “1“, we should delete the file, such as “10“, “01“, “21” and so on.
The find command has the -regex pattern option to filter filenames matching the given regex. Therefore, we can turn to the find command to find the files we want to delete:
$ find logs/ -regex '.*0[67]-\(1.\|.1\)$'
logs/app.log.2020-06-15
logs/app.log.2020-07-15
logs/app.log.2020-06-21
It’s worthwhile to mention that in our logs directory, we don’t have any sub-directory. Otherwise, we may need to add two additional options to the find command: “-maxdepth 1 -type f”. The two options tell the find command to search files only under the logs directory.
5.2. Delete Found Files
Once we have the files to delete, it’s not a challenge for us to perform the actual deletion operation on them. We have several ways to delete the files we’ve found.
Let’s first take a look at using the -exec action of the find command to execute the rm command on found files:
$ find logs/ -regex '.*0[67]-\(1.\|.1\)$' -exec rm "{}" \;
Also, we can delete the files found by the find command via the xargs command:
$ find logs/ -regex '.*0[67]-\(1.\|.1\)$' | xargs -I{} rm "{}"
The find command also supports the -delete action to remove the matching files:
$ find logs/ -regex '.*0[67]-\(1.\|.1\)$' -delete
No matter which deletion approach we want to apply, it’s always a good practice to check the find result before using the deletion operation.
6. Using the awk Command
So far, we’ve seen the power of brace expansion, file globbing, and regex. Using these approaches, we can solve most batch file deletion problems.
However, when the deletion requirements are not limited to pattern matching, the above three techniques may not solve the problems efficiently.
In this section, we’ll see a couple of new problems:
- Problem 1: delete log files in a date range
- Problem 2: delete log files older than n days
The powerful awk command will help us to solve these problems.
6.1. Deleting Log Files in a Date Range
Let’s say we want to delete the log files between 2020-06-23 and 2020-08-29.
Finding log files in a date range isn’t easy to solve using the pattern matching techniques we’ve learned so far. But it’s straightforward to get those files using the awk command:
$ awk -F'.' -v from='2020-06-23' -v to='2020-08-29' '$3>=from && $3<=to' <(ls -1 logs/*)
logs/app.log.2020-06-23
logs/app.log.2020-06-24
logs/app.log.2020-06-25
logs/app.log.2020-07-25
logs/app.log.2020-07-26
logs/app.log.2020-07-27
logs/app.log.2020-07-28
logs/app.log.2020-08-25
logs/app.log.2020-08-26
logs/app.log.2020-08-27
logs/app.log.2020-08-28
logs/app.log.2020-08-29
As the output above shows, the awk command has found the log files between 2020-06-23 and 2020-08-29.
Now, let’s take a closer look at the awk command and understand how it works:
- <(ls -1 logs): Here, we feed the awk command using process substitution. The output of the command ls -1 logs/* becomes the input of the awk command
- -F ‘. ‘ -v from=’2020-06-23′ -v to=’2020-08-29’: We use the period (“*.“) as the field separator so that we can extract the date field ($3*) easily. Also, we declare two variables to store the boundaries of the given date range
- ‘$3>=from && $3<=to’: This is pretty easy to understand — we take the log files covered by the time range and print the filenames
After we have the files to delete, the next step will be executing the actual deletion operation on the files.
We’ve learned to use the find | xargs combination to remove files found by the find command.
Similarly, we can also pass the output of the awk command to xargs and rm:
$ awk -v from='2020-06-23' -v to='2020-08-29' -F'.' '$3>=from && $3<=to' <(ls -1 logs/*) | xargs -I{} rm "{}"
$ ls -1 logs
app.log.2020-06-22
app.log.2020-08-30
app.log.2020-08-31
As the result shows, the log files between the two given dates have been deleted.
6.2. Deleting Log Files Older Than a Given Number of Days
Let’s say we want to delete the log files older than 35 days.
Before we show the awk command, let’s have a look at the date command to calculate the date 35 days ago:
# the current date:
$ date +%F
2020-08-31
# 35 days ago
$ date +%F -d '35 days ago'
2020-07-27
Once we have the date 35 days ago, this problem turns into “Delete log files older than 2020-07-27″. Let’s see how the awk command finds those files in one shot:
$ awk -F'.' -v dt="$(date +%F -d '35 days ago')" '$3 <= dt' <(ls -1 logs/*)
logs/app.log.2020-06-22
logs/app.log.2020-06-23
logs/app.log.2020-06-24
logs/app.log.2020-06-25
logs/app.log.2020-07-25
logs/app.log.2020-07-26
logs/app.log.2020-07-27
We use command substitution to assign the variable dt by the output of the date command. The remaining work becomes easy. We just list log files whose dates are earlier or equal to the value of the variable dt.
Next, let’s remove the files we found.
We’ve learned we can pipe the output of the awk command to “xargs rm” to delete the files. Alternatively, we can ask the awk command to build the rm commands and pipe the output to the sh to execute. The entire command looks like: awk ‘…codes to build rm cmds…’ input | sh.
Let’s solve this problem using this technique. First, let’s build the rm commands:
$ awk -F'.' -v dt="$(date +%F -d '35 days ago')" -v rm_cmd='rm "%s"\n' '$3 <= dt{printf rm_cmd,$0} ' <(ls -1 logs/*)
rm "logs/app.log.2020-06-15"
rm "logs/app.log.2020-06-21"
rm "logs/app.log.2020-06-22"
rm "logs/app.log.2020-06-23"
rm "logs/app.log.2020-06-24"
rm "logs/app.log.2020-06-25"
rm "logs/app.log.2020-07-15"
rm "logs/app.log.2020-07-25"
rm "logs/app.log.2020-07-26"
rm "logs/app.log.2020-07-27"
If the generated commands look good, we pipe them to the sh to do the actual deletion:
$ awk -F'.' -v dt="$(date +%F -d '35 days ago')" -v rm_cmd='rm "%s"\n' '$3 <= dt{printf rm_cmd,$0}' <(ls -1 logs/*) | sh
$ ls -1 logs
app.log.2020-07-28
app.log.2020-08-25
app.log.2020-08-26
app.log.2020-08-27
app.log.2020-08-28
app.log.2020-08-29
app.log.2020-08-30
app.log.2020-08-31
Thus, all log files older than 35 days have been deleted.
7. Conclusion
In this article, we discussed different approaches to do a batch deletion of files.
To find the files to delete, we can match filenames using Bash’s brace expansion and file globbing. Further, if we want to match a complicated pattern, the regex is the way to go.
Finally, we’ve addressed the powerful awk utility. It can list files not only by regex pattern matching but also based on some calculation logic.
Above all, we should always examine the found files before we apply the actual deletion on them.