1. Introduction

In Linux environments, tar archives are widely used for their file bundling capabilities. As system administrators, we often create TAR archives as a single entity for file aggregation and transmit them to other servers for consumption. Eliminating files from the archived tar balls can be essential for effective tailoring based on our needs.

In this tutorial, we’ll look into different ways of removing a file or directory from TAR archives on Linux systems.

2. What Is a tar Archive?

A tar archive is a collection of files and directories that combine into a single file to streamline administrative tasks and data management.

The word tar in this context refers to a tape archive since the format is often part of magnetic tape backups. However, the application of TAR archives expanded to storage devices and even for transferring data over networks. These archive files use the tar command on Linux platforms.

By default, tar doesn’t compress the data, but in combination with other compression tools such as gzip, bzip2, lzma, or xz, we can achieve this as well. The compressed TAR archive is frequently called a tarball.

For example, let’s create five new files using the touch command and use the ls command to list all files in the current directory:

$ touch file{1..5}.txt
$ ls
file1.txt  file2.txt  file3.txt  file4.txt  file5.txt

Next, we’ll create the files_dump.tar archive that bundles the file1.txt, file2.txt, file3.txt, file4.txt, and file5.txt files. To do so, we use -c for create, -a for automatic format detection based on extension (.tar in this case), and -f to specify the output file:

$ tar -caf files_dump.tar file1.txt file2.txt file3.txt file4.txt file5.txt

Alternatively, we can also use the regex patterns to bundle multiple files and directories:

$ tar -caf files_dump.tar file[1-5].txt
$ tar -caf files_dump.tar file*.txt

Here, we’re using the symbol (dash) to denote the range of numbers or alphabets, and the symbol * (asterisk) represents everything at the current level of the directory structure.

Furthermore, we can also make use of the –list option to list out all files and directories in the archive:

$ tar --file files_dump.tar --list
file1.txt
file2.txt
file3.txt
file4.txt
file5.txt

Thus, we can verify that the files are the ones we archived earlier.

3. Nested Directory Archive

Additionally, tar can produce a bundled archive that comprises several files and directories in a hierarchical structure. Such a nested directory archive helps to restore the directory structure when extracting or unpacking the archive.

Now, let’s use the tree command to view everything under the current directory:

$ mkdir file_dir
$ touch file_dir/test{1..5}.txt
$ tree
.
├── file1.txt
├── file2.txt
├── file3.txt
├── file4.txt
├── file5.txt
└── file_dir
    ├── test1.txt
    ├── test2.txt
    ├── test3.txt
    ├── test4.txt
    └── test5.txt

Continuing on, let’s use the tar command to create an archive of all files and directories in the current path:

$ tar -caf files_dump.tar *
$ tar --file files_dump.tar --list
file1.txt
file2.txt
file3.txt
file4.txt
file5.txt
file_dir/
file_dir/test1.txt
file_dir/test2.txt
file_dir/test3.txt
file_dir/test4.txt
file_dir/test5.txt

Here, we just used * as tar works recursively. After that, we verified the contents of the new archive using the –list option.

4. Remove a File From a TAR Archive

In addition, the tar command also enables us to remove a file from an archive. To do this, we use the –delete option. In this scenario, we’ll use -v and -f to get a verbose list of the files processing from the archive:

$ tar -vf files_dump.tar --delete file2.txt

Following the removal of file2.txt from files_dump.tar, let’s see the list of files available in the files_dump.tar package to confirm file2.txt is no longer among them:

$ tar --file files_dump.tar --list
file1.txt
file3.txt
file4.txt
file5.txt
file_dir/
file_dir/test1.txt
file_dir/test2.txt
file_dir/test3.txt
file_dir/test4.txt
file_dir/test5.txt

Besides, let’s proceed to extract the files and directories from the files_dump.tar archive to cross-verify it further:

$ tar -xvf files_dump.tar
$ tree
.
├── file1.txt
├── file3.txt
├── file4.txt
├── file5.txt
└── file_dir
    ├── test1.txt
    ├── test2.txt
    ├── test3.txt
    ├── test4.txt
    └── test5.txt

Nonetheless, it’s important to exercise caution when working with TAR archives, as removing files is a permanent action that cannot be undone.

5. Remove Multiple Files From a TAR Archive

Similarly, multiple files can be removed from a tar archive with nested objects.

To demonstrate, we’ll again be using the regex to remove a range of files using the (dash) symbol:

$ tar -vf files_dump.tar --delete file1.txt file3.txt file_dir/test1.txt file_dir/test[2-4].txt

Let’s take a brief look at the updated archive using the –list option:

$ tar --file files_dump.tar --list
file4.txt
file5.txt
file_dir/
file_dir/test5.txt

Thus, we see our operation was successful.

6. Remove a Directory From a TAR Archive

Additionally, the tar command enables us to remove entire directories from the archives to achieve greater flexibility in archive management.

In this instance, we’ll delete the file_dir directory from the files_dump.tar archive:

$ tar -vf files_dump.tar --delete file_dir

Now, let’s take a moment to review the updated archive with the –list option:

$ tar --file files_dump.tar --list
file4.txt
file5.txt

Then, let’s extract the files and directories from the files_dump.tar:

$ tar -xvf files_dump.tar
file4.txt
file5.txt
$ tree
.
├── file4.txt
├── file5.txt
└── files_dump.tar

0 directories, 3 files

Here, we used the –delete option to remove the entire directory from the TAR archive.

7. Extract Selected Files or Directories

Moving on, we can also extract only some objects from tar archives:

$ tar --file files_dump.tar --extract file1.txt file_dir/test1.txt

Here, we use the –extract option to get only the required files and directories from an archive.

Next, we’ll take a quick look at the files and directories in the current path using the tree command:

$ tree
.
├── file1.txt
├── file_dir
│   └── test1.txt
└── files_dump.tar

1 directory, 3 files

This way, we’ve selectively extracted the files (file1.txt and file_dir/test1.txt) in their original hierarchical order.

8. Conclusion

In summary, removing files and directories from a tar archive is a fairly straightforward and essential process that allows system administrators to manage their data efficiently.

In this article, we’ve explored various methods to achieve this task, including the tar command usage with the –delete option, which provides intuitive ways to remove files. Additionally, we’ve emphasized the significance of cross-verifying the archive after removal to ensure the integrity of the data. Further, we also discussed the usage of appropriate command-line options to extract specific files or directories from the archive, thereby reducing storage space and data-loss.