1. Overview
When we work under the Linux command-line, we often need to manipulate files. Deleting files is a sort of common operation.
We may be facing different requirements of file deletion — for instance, deleting files older than a given time, recursively deleting files with a specific extension, delete multiple files at once, and so on.
This time, we’ll discuss how to delete files listed in another file.
2. Introduction to the Problem
2.1. The Example of the Problem
An example may explain the problem more easily.
First of all, we’ve prepared a directory delTest and some files under it:
$ tree /tmp/delTest/
/tmp/delTest/
├── jpg_files
│ ├── olympic tokyo 001.jpg
│ ├── olympic tokyo 002.jpg
│ └── olympic tokyo 003.jpg
├── pdf_files
│ ├── bar.pdf
│ └── foo.pdf
├── toDelete.txt
└── txt_files
├── bar.txt
└── foo.txt
3 directories, 8 files
We used the tree command to output the directory structure more clearly. Note that the filenames under the jpg_files directory contain spaces.
Let’s say we’ve defined the files we want to remove in the file toDelete.txt:
$ cat toDelete.txt
/tmp/delTest/pdf_files/foo.pdf
/tmp/delTest/txt_files
/tmp/delTest/jpg_files/olympic tokyo 001.jpg
Our goal is to delete files that are listed in the toDelete.txt file.
2.2. The Ideas to Solve the Problem
Since the toDelete.txt file has defined files and directories we want to delete, we have two ideas to solve the problem:
- Read each line in the toDelete.txt file and execute the rm command on each to delete the file.
- Read the content of the toDelete.txt file and convert the lines into a bunch of rm commands. Finally, pipe the result to the sh command to execute them.
We’ll address three approaches to cover both ideas:
Now, let’s see them in action.
3. Using Pure Bash
Today, Bash has become the default shell for most modern Linux distributions. So, if we solved a problem with pure Bash, that is to say, our solution doesn’t rely on any extra dependencies.
3.1. Handling Directories
As the example shows, the problem looks pretty simple. However, the problem can have a few variants. Now, let’s take a closer look at them.
The files listed in the toDelete.txt file may contain directories. For example, /tmp/delTest/txt_files is a directory.
Depending on the requirement, we may want to skip deleting directories or delete the directories recursively as well.
The rm command provides a -r option. This option allows us to remove a directory recursively. Moreover, if we execute rm -r with a regular file, it deletes the file too.
Therefore, if the requirement is deleting files and directories in the toDelete.txt file, we can execute rm -r with the lines in the file. Otherwise, we call rm without the -r option to skip deleting directories.
In this tutorial, let’s assume that the requirement is removing directories and files. That is, we’ll use rm with the -r option as our file deletion command.
3.2. Quoting the Filename
Most Linux filesystems accept whitespaces in the filenames. For example, the file /tmp/delTest/jpg_files/olympic tokyo 001.jpg contains multiple spaces.
When we manipulate files by their names, it’s a good practice to quote their names. Or we may see unexpected results.
However, it’s worth mentioning that quoting filenames will disable glob expansion, no matter whether we use single-quotes or double-quotes.
Let’s also assume that supporting glob isn’t a requirement of this problem. This is because it could be dangerous in the real world if there are some mistakes in the toDelete.txt file. For example, /path/*/* would delete quite a lot of files if we run rm -r on it.
Therefore, we’ll always quote the filenames in all the solutions.
Now, let’s write a simple shell script to read the toDelete.txt file and remove the files.
3.3. The Simple Bash Script
First, let’s look at the script:
$ cat delFromFile.sh
#!/bin/bash
TO_BE_DEL="$1"
IFS=""
while read -r file ; do
rm -r "$file"
done < "$TO_BE_DEL"
The script isn’t difficult to understand. It contains only a while loop. Let’s pass it through quickly.
First, we set the IFS variable to empty so that each line in toDelete.txt becomes a record when we read it.
In the while loop, we execute the rm -r command on each directory or file defined in the toDelete.txt file.
Next, let’s test the script and see if it works as we expected:
$ ./delFromFile.sh /tmp/delTest/toDelete.txt
$ tree /tmp/delTest
/tmp/delTest
├── jpg_files
│ ├── olympic tokyo 002.jpg
│ └── olympic tokyo 003.jpg
├── pdf_files
│ └── bar.pdf
└── toDelete.txt
2 directories, 4 files
After we’ve executed the script, we call the tree command again to verify the result. We see that we’ve successfully deleted the directories and files listed in toDelete.txt.
Therefore, we’ve solved the problem.
4. Using the xargs Command
The xargs command reads input from stdin and converts it into arguments to feed other commands.
Now, let’s see the one-liner xargs command to delete the files listed in the toDelete.txt file:
xargs -I{} rm -r "{}" < /tmp/delTest/toDelete.txt
Since the xargs command only reads from stdin, we redirect the input file toDelete.txt to stdin. Also, we defined a placeholder “{}” so that we can quote the filenames in the rm command.
Next, let’s restore the files under the /tmp/delTest directory and test this command:
$ xargs -I{} rm -r "{}" </tmp/delTest/toDelete.txt
$ tree /tmp/delTest
/tmp/delTest/
├── jpg_files
│ ├── olympic tokyo 002.jpg
│ └── olympic tokyo 003.jpg
├── pdf_files
│ └── bar.pdf
└── toDelete.txt
2 directories, 4 files
The command works as we expected.
5. Using the sed Command
So far, we’ve seen how to solve the problem using pure Bash and xargs. Both approaches take each line from the toDelete.txt file and feed the rm -r command.
We can follow the other idea to convert the content of the file to multiple rm commands.
The sed command can easily handle this kind of task with a compact one-liner:
$ sed 's/.*/rm -r "\0"/' /tmp/delTest/toDelete.txt
rm -r "/tmp/delTest/pdf_files/foo.pdf"
rm -r "/tmp/delTest/txt_files"
rm -r "/tmp/delTest/jpg_files/olympic tokyo 001.jpg"
As the above output shows, the sed one-liner has built the rm commands we need through substitutions. Further, the filename is properly quoted, too.
Although the sed command itself won’t execute the actual deletion, we have the chance to check the rm commands that it produces. Thus, it may help us to detect mistakes.
If the commands look good, we can just pipe the result of the sed one-liner to sh to delete the files.
Next, let’s restore the files and test our sed one-liner:
$ sed 's/.*/rm -r "\0"/' /tmp/delTest/toDelete.txt | sh
$ tree /tmp/delTest
/tmp/delTest/
├── jpg_files
│ ├── olympic tokyo 002.jpg
│ └── olympic tokyo 003.jpg
├── pdf_files
│ └── bar.pdf
└── toDelete.txt
2 directories, 4 files
6. Using the awk Command
Building rm commands from the toDelete.txt file is a piece of cake for awk. Similarly, we can also use awk‘s substitution functions to do the job.
However, here, we show an alternative way:
$ awk -v q='"' '$0 = "rm -r " q $0 q' /tmp/delTest/toDelete.txt
rm -r "/tmp/delTest/pdf_files/foo.pdf"
rm -r "/tmp/delTest/txt_files"
rm -r "/tmp/delTest/jpg_files/olympic tokyo 001.jpg"
Let’s quickly understand how the short awk command works.
To avoid escaping quote characters and make the code easier to read, we’ve declared an awk variable q storing the double-quote character.
We know that after we concatenate “rm -r ” and quotes to the original input line, the result must not be an empty string.
awk will evaluate any non-empty and non-zero string as True. Moreover, t****he statement ‘True’ will trigger the default action: printing the current processing line.
Therefore, awk prints the generated rm commands.
Same as the sed solution, if we want to do the actual deletion, we just need to pipe the awk output to sh:
$ awk -v q='"' '$0="rm -r " q $0 q' /tmp/delTest/toDelete.txt | sh
$ tree /tmp/delTest
/tmp/delTest/
├── jpg_files
│ ├── olympic tokyo 002.jpg
│ └── olympic tokyo 003.jpg
├── pdf_files
│ └── bar.pdf
└── toDelete.txt
2 directories, 4 files
7. Conclusion
In this article, we’ve addressed two ideas to solve the problem: “delete files listed in a file”.
The pure Bash and xargs solutions read each line from the file and feed the rm command to do the job.
Alternatively, we can also use some text processing utilities, such as sed and awk, to generated multiple rm commands and pipe them to sh to solve the problem.