1. Overview
The tar command is used to archive files in Linux and Unix-based systems. It creates archives in many formats, such as .tar, .tar.gz, .cpio, .tar.bz2, .zip, .rar, etc. The command uses the gzip algorithm when creating .tar.gz files, and the bzip algorithm when creating .tar.bz2 files.
In this tutorial, we’ll focus on the different methods we can use to exclude one or more directories when creating a .tar.gz archive file.
2. Setup
First, we’ll create a directory named parent_directory with the mkdir command:
$ mkdir parent_directory
We’ll use this directory to host the files and directories that we’ll be using in this tutorial.
Next, we’ll navigate into the directory:
$ cd parent_directory
Then we can use the touch command to create three random files:
$ touch file1.txt file2.txt file3.txt
Finally, we’ll create three directories:
$ mkdir folder1 folder2 folder3
After running the commands above, we should have this directory structure:
$ ls
file1.txt file3.txt folder2
file2.txt folder1 folder3
3. Using –exclude Option
The tar –exclude option has this basic syntax:
$ tar --exclude="pattern" [options] [archive_name] [path]
We’ll use the –exclude option to skip a file or directory when creating a .tar.gz archive:
$ tar --exclude='file1.txt' -zcvf backup.tar.gz .
./
./folder3/
./file3.txt
./folder1/
./file2.txt
./folder2/
tar: .: file changed as we read it
Let’s break down this command to understand it:
- -z: compresses the files and directories using gzip
- -c: creates a new archive file
- -v: verbosely lists the files and directories processed
- -f: allows us to specify a filename for the archive created
- –exclude: excludes file1.txt file when creating the archive
We used the “*.*” at the end of the command as the path to the current working directory containing the files we want to archive.
When excluding directories, we shouldn’t use the trailing slash(/) at the end of the directory name.
We got the message “file changed as we read it” because we created the backup.tar.gz file in the same directory that contains the items to be archived.
Since we used the -v option, we can see from the output above that file1.txt was skipped. Alternatively, we can use this command to list the contents of the backup.tar.gz file without extracting it:
$ tar -tf backup.tar.gz
./
./folder3/
./file3.txt
./folder1/
./file2.txt
./folder2/
We use the -t option to list out the contents of the archive file.
3.1. Exclude Multiple Files and Directories
We can exclude more than one file or directory by chaining multiple –exclude options:
$ tar --exclude='file1.txt' --exclude='folder1' -zcvf backup.tar.gz .
./
./folder3/
./file3.txt
./file2.txt
./folder2/
tar: .: file changed as we read it
Alternatively, we can also pass in files and directories to be excluded in this format:
$ tar --exclude={"file1.txt", "file2.txt"} -zcvf backup.tar.gz .
file3.txt
folder3
folder2
folder1
Curly braces on the terminal can sometimes be problematic with Bash functions. This –exclude option variation might not work consistently with other systems.
3.2. Exclude Files With a Specific Extension
We can also pass patterns to exclude specific file extensions:
$ tar --exclude='*.txt' -zcvf backup.tar.gz .
./
./folder3/
./folder1/
./folder2/
tar: .: file changed as we read it
The command above skips all files with a .txt extension.
We got the message “file changed as we read it” because the backup.tar.gz file is created in the same directory as the items to be archived.
The tar command has built-in options that allow us to ignore specific files and directories that are sometimes auto-generated. Let’s look at some of these options and the functions they perform:
- –exclude-backups: excludes all backup and lock files
- –exclude-caches: excludes all directories with a CACHEDIR.TAG, except for the tag itself
- –exclude-vcs: excludes all version control system files
- –exclude-vcs-ignores: excludes files that match patterns of specific ignore files from version control systems. For example, files, directories, and file extensions listed in a .gitignore file will be skipped, plus the .gitignore file itself
4. Using an Exclude File
Alternatively, we can provide the tar command with a file containing the list of files or directories to exclude when creating or extracting archive files. This file is called an exclude file.
Let’s see how to use an exclude file to ignore specific files and directories while archiving.
First, we’ll create an exclude_file.txt file:
$ touch exclude_file.txt
Then we can add a list of file or directory names to be excluded, each separated by a newline:
file1.txt
folder3
file2.txt
We’ll use this exclude_file.txt to exclude the items listed in it:
$ tar -zcvf backup.tar.gz -X exclude_file.txt .
./
./folder1/
./folder2/
./exclude_file.txt
./file3.txt
tar: .: file changed as we read it
We’re using the -X option to enter an exclude file. It’s a short method of writing the –exclude-from option:
$ tar -zcvf backup.tar.gz --exclude-from="exclude_file.txt" .
The tar command allows us to also include file extension patterns to be skipped in the exclude file.
Let’s change the content in the exclude_file.txt to match this:
*.txt
After updating the file, we’ll try to compress the files again:
$ tar -zcvf backup.tar.gz --exclude-from="exclude_file.txt" .
./
./folder3/
./folder2/
./folder1/
We can see that the command above skips all files with the .txt extension.
6. Conclusion
In this article, we explored some methods to exclude specific files or directories when creating a .tar.gz archive.
Using the –exclude option can be useful in instances when we need to quickly create an archive within a small directory tree. Alternatively, using an exclude file is convenient when there’s a relatively large directory tree with thousands of files and directories.