1. Overview
In this short tutorial, we’re going to examine some available combinations of commands to find a set of files and compress them into a tarball.
For this purpose, we’ll be looking at:
- find, which allows looking up files with different filters and options
- tar, which compresses the given input files into an archive
Without further ado, let’s take a look at our options.
2. Solutions
In order to avoid confusion, we’ll have our examples revolving around a very basic use case in which we want to tar all the files in the current directory.
2.1. Using Command Substitution
First, let’s consider the simplest possible solution, which uses command substitution to execute our find and supply its output to tar:
$ tar -czf archive.tar.gz `find . -type f`
Although this is a pretty straightforward approach, this command doesn’t handle files with spaces in the name. Suppose our find command returns a file named “hello world.txt” in its output. Since find doesn’t add quotes around the filenames or escape whitespace characters within filenames in its output, the tar command will see this as two file arguments: hello and world.txt. Therefore, it won’t add the file to our archive.
2.2. Using xargs
Another easy option would be using xargs to convert the find output to a tar argument:
$ find . -type f | xargs tar -czf archive.tar.gz
Unluckily, this has the same shortcoming as the previous one since spaces are the default delimiter that xargs uses to split arguments. However, this time we can easily fix it by changing the delimiter to another character with the -d option:
$ find . -type f | xargs -d "\n" tar -czf archive.tar.gz
2.3. Using find -print0
Although our last example works, it’s usually a better practice to use a null character separator to avoid unexpected results. This can be easily achieved using find -print0, which replaces the default separator with null. We’ll also use tar –null to split files with this character instead of a newline:
$ find . -type f -print0 | tar -czf archive.tar.gz --null -T -
One thing to keep in mind, though, is that -print0 applies only to the last condition, so, if we have more than one, we need to wrap them with parentheses:
$ find . \( -type f -o -name '*.c' \) -print0 | tar -czf archive.tar.gz --null -T -
Note that here we’re using tar -T – to compress the files listed in stdin instead of the xargs equivalent.
2.4. Using a File
The next approach we’re going to consider involves using an intermediate file to store find output and then use the same file as an input for tar. This solution comes in handy if we want to keep track of the files we’re compressing for logs or further processing:
$ find . -type f > archiveFileList && tar -czf archive.tar.gz -T archiveFileList
2.5. Using find -exec
Lastly, let’s consider a slightly different scenario in which we want to create multiple archives, one for each file found. In order to do that, we can use find -exec, which will run a given command per matched file. Furthermore, we can use the current file name by referencing it with a pair of curly braces:
$ find . -type f -exec tar -czf {}.tar.gz {} \;
3. Conclusion
In this post, we went through different options to find some files and put them into a tar archive.
Although command substitution and xargs are generally simpler, find -print0 is considered to be a more robust solution. Finally, the use of an intermediate file and of find -exec are recommended only for specific use cases.