1. Overview
Under the Linux command line, we can use the find command to get a list of files or directories. Usually, we want to do some operations on the files we found, for instance, find and tar files.
In this tutorial, we’re going to take a look at how to delete the files or directories we’ve found.
2. Introduction to the Problem
There are several ways to delete the files and directories found by the find command. It’s not a hard problem. Perhaps we already have some solutions in our minds.
However, some solutions can be dangerous if we don’t correctly use them. Further, some solutions may not work well in terms of performance.
In the remainder of this tutorial, we’ll take a closer look at a common pitfall of using the find command and explain why it’s dangerous.
Moreover, we’ll discuss the performance as well.
First, let’s create a directory structure as an example:
$ tree -a test
test
├── kotlin
│ ├── ktApp1
│ │ └── .git
│ │ └── whatever.txt
│ └── ktApp2
│ └── .git
│ └── whatever.txt
└── python
├── pyApp1
│ └── .git
│ └── whatever.txt
└── pyApp2
└── .git
└── whatever.txt
10 directories, 4 files
As the above tree output shows, we’ve created a test directory with some subdirectories and files.
We’ll try two deletions on the test directory:
- File deletion: Remove all whatever.txt files
- Directory deletion: Delete all *.*git directories and the files under them
Let’s have a look at the find commands to find our target directories and files.
First, let’s find all whatever.txt files:
$ find test -name 'whatever.txt'
test/python/pyApp2/.git/whatever.txt
test/python/pyApp1/.git/whatever.txt
test/kotlin/ktApp2/.git/whatever.txt
test/kotlin/ktApp1/.git/whatever.txt
Similarly, we can also find the .git directories:
$ find test -type d -name '.git'
test/python/pyApp2/.git
test/python/pyApp1/.git
test/kotlin/ktApp2/.git
test/kotlin/ktApp1/.git
In this tutorial, we’ll introduce three approaches to delete our target files and directories:
- Using the find command’s -delete action
- Using find -exec
- Using find | xargs rm
So far, we’ve seen how to locate the files or directories we want to delete using the find command. Also, we know we can connect Linux commands with pipes and let different commands solve our problems cooperatively.
Many of us may think that the most straightforward approach to solving this problem would be piping the find result to the rm command. What is a bit surprising is, it’s not in the bullet list above.
Therefore, before we look at the real solutions to the problem, let’s understand why we can’t pipe find‘s result to rm.
3. Why “find … | rm” Won’t Work?
We need to understand what the pipe does before we answer this question. First of all, let’s see an example:
$ ls -1 / | grep '^m'
media/
mnt/
In the simple example above, we pipe the ls command’s result to grep and find out the root directories whose names begin with “m”.
Simply put, here, the pipe converts the standard output (Stdout) of ls to standard input (Stdin) of the grep command.
This command works because the grep command accepts reading from the Stdin. We can pipe the Stdout to further commands that support reading from Stdin, for example:
$ ls -1 / | grep '^m' | sed 's/^m/OK_m/'
OK_media/
OK_mnt/
We can see this kind of “command chain” pretty often in the real world.
However, not all Linux commands support reading from Stdin. Typical examples are those commands doing file handling, for example, cp, mv, and rm. These commands ignore the Stdin.
For instance, when we execute the command “rm file“, rm accepts the command-line argument file, which is indicating a file. It won’t read the Stdin at all:
$ echo "file" | rm
rm: missing operand
Therefore, the idea “find …. | rm” won’t work, either.
However, sometimes we would like to somehow turn one command’s Stdout into another command’s argument. That’s where xargs comes in handy. We’ll see it in action in later sections.
Now, let’s explore the solutions to our “find and delete” problem.
4. Using the find Command and the -delete Action
The find command provides a -delete action to remove files. Next, let’s delete the target files and directories using this action.
4.1. Deleting the Target Files and Directories
We can remove all whatever.txt files by adding the -delete option to the find command:
$ find test -name 'whatever.txt' -delete
$ tree -a test
test
├── kotlin
│ ├── ktApp1
│ │ └── .git
│ └── ktApp2
│ └── .git
└── python
├── pyApp1
│ └── .git
└── pyApp2
└── .git
10 directories, 0 files
Good, it works. All whatever.txt files have been deleted.
Next, let’s restore the test directory and try to remove the .git directories recursively:
$ find test -type d -name '.git' -delete
find: cannot delete ‘test/python/pyApp2/.git’: Directory not empty
find: cannot delete ‘test/python/pyApp1/.git’: Directory not empty
find: cannot delete ‘test/kotlin/ktApp2/.git’: Directory not empty
find: cannot delete ‘test/kotlin/ktApp1/.git’: Directory not empty
Oops, this time, we got error messages. This is because the -delete action cannot delete a non-empty directory recursively. That is, it can only delete files and empty directories.
4.2. The Dangerous Pitfall of the -delete Usage
Next, let’s do an interesting test. We know that the order of options of a Linux command doesn’t usually matter.
For example, the following two ls commands are identical, even though the options are in a different order:
ls -F -a -l --color
ls -l -a --color -F
Now, let’s re-order the options in our last find command by moving the -delete option to the first position and see what will happen:
$ find test -delete -type d -name '.git'
$ ls test
ls: cannot access 'test': No such file or directory
This time, there’s no error message. It means the command has been executed successfully.
However, when we check the result, we’ve found that the test directory has been deleted completely! Let’s understand why it has happened.
Let’s revisit our find command. We can call the three options: -delete, -type d, and -name ‘.git’. However, we shouldn’t forget that find treats them as three expressions as well.
An expression in the find command will be evaluated, returning a boolean value, and the -delete action always returns true.
If the -delete action is at the first position, during its evaluation, it’ll delete the given directory and everything in it, which is the test directory in our example.
But wait — we’ve just learned the -delete action won’t remove non-empty directories. Why was everything under test deleted?
This is because the -delete action implies the -depth option.
The -depth option asks the find command to search each directory’s contents before the directory itself. Therefore, if we put -delete as the first option, it’ll start deletion from each directory tree’s very bottom. First, it removes all files under a directory, then the empty directory itself, until everything has been removed.
When we use the find command, we should keep in mind that we should never put the -delete action at the first position. If we do, it can delete files unexpectedly.
5. Using find -exec
When we use the find command with the -exec action, we can execute external commands on its result. Now, let’s execute the rm command to delete our target files and directories in this approach:
$ find test -name 'whatever.txt' -exec rm {} \;
$ tree -a test
test
├── kotlin
│ ├── ktApp1
│ │ └── .git
│ └── ktApp2
│ └── .git
└── python
├── pyApp1
│ └── .git
└── pyApp2
└── .git
10 directories, 0 files
Good, all whatever.txt files have been deleted. When we use -exec with an external command, it will fill each found file in the ‘{}’ placeholder.
Similarly, we can remove all .git directories if we add the -r option to the rm command. Let’s restore the test directory and give it a try:
$ find test -depth -type d -name '.git' -exec rm -r '{}' \;
$ tree -a test
test
├── kotlin
│ ├── ktApp1
│ └── ktApp2
└── python
├── pyApp1
└── pyApp2
6 directories, 0 files
As the output shows, all .git directories have been successfully deleted.
6. Using the find | xargs rm Combination
Now, we’ve learned that we can execute the rm command using find‘s –exec action. Alternatively, we can also pipe the result of the find command to xargs and let xargs call the rm command to delete those files.
Next, let’s see how to remove all whatever.txt files using this approach:
$ find test -name 'whatever.txt' | xargs rm
$ tree -a test
test
├── kotlin
│ ├── ktApp1
│ │ └── .git
│ └── ktApp2
│ └── .git
└── python
├── pyApp1
│ └── .git
└── pyApp2
└── .git
10 directories, 0 files
Similarly, we can also remove all .git directories in the same way. Let’s restore the test directory and test it:
$ find test -type d -name '.git' | xargs rm -r
$ tree -a test
test
├── kotlin
│ ├── ktApp1
│ └── ktApp2
└── python
├── pyApp1
└── pyApp2
6 directories, 0 files
As the output above shows, all .git directories have been deleted.
We may ask: If find -exec rm can solve the problem, why do we need to introduce an extra xargs process to do the same?
To learn the answer, let’s discuss their performance.
7. Benchmarking the Performance of find -exec and find | xargs
First, let’s explain how the find -exec rm approach works. When we use this approach, an rm process will be executed for each file the find command has found. That is, we’ll execute the rm command one million times if the find command finds a million files.
On the other hand, if we execute find | xargs rm, xargs will build found files into bundles and run them through the command as few times as possible.
Therefore, if our find command returns a large number of files or directories, find | xargs COMMAND will be much faster than the find -exec COMMAND approach.
Next, let’s do the same performance test with each approach and benchmark their performance.
We’ll delete 3000 files using each command and measure their execution time using the time command.
First, let’s test with the find -exec rm approach:
$ touch {1..3000}.txt
$ ls -l *.txt | wc -l
3000
$ time find . -name '*.txt' -exec rm '{}' \;
real 0m6.072s
user 0m3.130s
sys 0m2.932s
On this machine, it took about six seconds to delete all files.
Next, it’s xargs‘s turn. Let’s see if it can do the same test faster:
$ touch {1..3000}.txt
$ ls -l *.txt | wc -l
3000
$ time find . -name '*.txt' | xargs rm
real 0m0.053s
user 0m0.029s
sys 0m0.029s
This time, it took only 0.05 seconds to remove the files. Comparing to the find -exec rm approach, using xargs on this test is 120 times faster! Wow!
Therefore, if find returns a large number of files, we should consider piping the result to the xargs command.
8. Conclusion
In this article, we’ve learned three different ways to delete files or directories found by the find command. Also, we’ve understood why piping find‘s output to rm won’t work.
Moreover, we’ve discussed a dangerous pitfall of find‘s -delete action usage through an example.
Finally, we’ve also analyzed the performance of two approaches: find -exec rm and find | xargs rm.