1. Overview
When we want to transfer multiple files between various Linux systems, it’s often a good idea to package them into a single archive.
If we think about archives and general-purpose file packaging, tape archives (tar) are the de facto standard on most Linux systems.
In this quick tutorial, we’re going to learn how to use the tar utility to manipulate such archives.
2. Syntax
Let’s first take a look at the basic syntax:
$ tar {operation} [options...] [file]...
By default, we need to specify at least one operation. The default options and input depend on the operation type. We’ll cover those separately.
In Linux, everything is a file. This means we can also archive directory structures of files.
For historical reasons, tar accepts operations in various formats:
- UNIX short format preceded by a dash
- GNU long format preceded by two dashes
- Old style format that does not use dashes
Most of our examples use the UNIX-style format. However, we’re also going to need some GNU options.
3. Basic Operations
Most of the time, we just need the three basic operations:
- Create an archive
- Extract an archive
- List an archive
Let’s take a closer look at each.
3.1. Creating Archives
Let’s create an archive composed of two files using the -c flag:
$ tar -c -f archive.tar file1 file2
$ ls
archive.tar file1 file2
In this sample, we also defined the output archive file with the -f option.
If we don’t specify an output file, the default behavior is to use what is defined in the TAPE environment variable or the compiled defaults.
In most cases, it will try to use the standard output stream. This will generate an error unless we redirect the output to a file:
$ tar -c file1 file2 > archive.tar
$ ls
archive.tar file1 file2
Let’s now assume that we have our two files in a folder called test:
$ ls -alh test
-rw-r--r-- 1 user users 6 Mar 20 23:19 file1
-rw-r--r-- 1 user users 12 Mar 20 23:19 file2
$ tar -c -f folder.tar test
$ ls -allh
-rw-r--r-- 1 user users 10K Mar 20 23:19 folder.tar
drwxr-xr-x 2 user users 4.0K Mar 20 23:19 test
By default, tar archives folders recursively. As a result, the archive will also contain the parent directory and the two files.
3.2. Extracting Archives
Now that we’ve seen how to create an archive, let’s take a look at how we can extract one.
We can instruct tar to extract an archive by using the -x flag:
$ tar -x -f archive.tar
$ ls -allh
-rw-r--r-- 1 user users 10K Mar 20 22:51 archive.tar
-rw-r--r-- 1 user users 6 Mar 18 20:54 file1
-rw-r--r-- 1 user users 12 Mar 18 20:54 file2
The default behavior is to overwrite existing filesystem files with the same name.
Notice that once again, we used to -f option to specify the archive to extract.
Likewise, we can omit this by redirecting our archive to the standard input stream of tar:
$ tar -x < archive.tar
This gives us the same result.
We can also extract just specific members of the archive:
$ tar -x -f archive.tar file2
$ ls -allh
-rw-r--r-- 1 user users 10K Mar 20 22:51 archive.tar
-rw-r--r-- 1 user users 12 Mar 18 20:54 file2
There’s one potential problem here. Remember when we archived a folder in the previous section?
In this case, we need to specify fully-qualified names of the files:
$ tar -x -f folder.tar test/file2
$ ls -allh
-rw-r--r-- 1 user users 10K Mar 20 23:30 folder.tar
drwxr-xr-x 2 user users 4.0K Mar 20 23:31 test
$ ls -allh test
-rw-r--r-- 1 user users 12 Mar 20 23:19 file2
The outcome is also a bit surprising. Notice, we also extracted the test folder to where file2 was placed.
3.3. Listing Archives
Sometimes we don’t want to extract an archive, but just take a look at its contents. This is what the listing option is used for.
We can use the -t option again combined with -f to print the contents of an archive:
$ tar -t -f archive.tar
file1
file2
Likewise, we can also use redirection:
$ tar -t < archive.tar
file1
file2
Let’s now take a look at our folder archive from the previous examples:
$ tar -t -f folder.tar
test/
test/file1
test/file2
Similarly, we can also specify which files or folders to list:
$ tar -t -f folder.tar test/file1
test/file1
4. Advanced Operations
Let’s now take a look at more advanced examples.
4.1. Updating Archives
There are situations where we want to add new files to an existing archive.
Luckily, we can use tar with the -u option. Remember our first archive that had two files?
Let’s now add a third one:
$ tar -t -f archive.tar
file1
file2
$ tar -u -f archive.tar file3
$ tar -t -f archive.tar
file1
file2
file3
We can also update existing archive members with newer versions. If tar detects a newer modification date for our file, it will add it to the archive.
Let’s modify the contents of file3 and add it again:
$ tar -u -f archive.tar file3
$ tar -t -f archive.tar
file1
file2
file3
file3
What happened here? We now have two instances of the file3 in our archive.
Updating does not replace the existing file in the archive. **Instead, the newly modified file is added to our archive.
**
How does updating work then? When we extract the archive, the order of extraction is given by the order we archive.
This means that the first instance of file3 is extracted first and then overwritten by the second (updated) version of file3.
Now let’s try to add the same file again without modifying it:
$ tar -u -f archive.tar file3
$ tar -t -f archive.tar
file1
file2
file3
file3
Nothing is added because tar detects that our file has not changed based on the modification date.
4.2. Appending to an Archive
We can also append to an existing archive using the -r option.
At this point, we might be asking what’s the difference between update and append.
Unlike the update, when we append, tar doesn’t care about modification dates.
Now let’s try to append instead of updating the same file3 used in our previous example:
$ tar -r -f archive.tar file3
$ tar -t -f archive.tar
file1
file2
file3
file3
file3
Notice that we now have three copies of the same file in the archive. Let’s explain a bit about how we ended up in this situation.
The first instance is the one we added in the beginning when we first updated the archive.
The second is the one we modified and then used it to update the archive again.
The third instance is identical to the second, and we just appended it to our archive. Remember that this was not possible with an update.
4.3. Concatenating Archives
Concatenating comes in handy when we want to combine existing archives in one.
Let’s assume we have archive1.tar and archive2.tar that contain our test files:
$ tar -t -f archive1.tar
file1
file2
$ tar -t -f archive2.tar
file3
We can merge everything in the first archive using the -A flag:
$ tar -A -f archive1.tar archive2.tar
$ tar -t -f archive1.tar
file1
file2
file3
4.4. Deleting Members from an Archive
Finally, in certain situations, we want to remove files or folders from an archive.
Let’s use this operation on our previous example:
$ tar --delete -f archive1.tar file2
$ tar -t -f archive1.tar
file1
file3
Notice that unlike previous options, we now used the GNU long format. That’s because we don’t have a short equivalent.
Most importantly, we should take care when using it because it involves rewriting the whole archive, which can be a slow process.
5. Options
So far, we’ve seen some basic options. Of course, depending on the operation type, we can use many more.
Let’s now play a bit with the most interesting ones.
5.1. Compression and Decompression
Up until this point, our archives were just a way to group files.
Let’s add some gzip compression using the -z option:
$ tar -c -z -f compressed.tar.gz large1 large2
$ ls -allh
-rw-r--r-- 1 user users 31K Mar 18 23:38 compressed.tar.gz
-rw-r--r-- 1 user users 10M Mar 18 22:16 large1
-rw-r--r-- 1 user users 20M Mar 18 22:19 large2
-rw-r--r-- 1 user users 31M Mar 18 23:38 uncompressed.tar.gz
Of course, the difference in size is spectacular because we used some dummy files.
As usual, we can always check the tar manual for all the possible compression algorithms.
What about extracting a compressed archive? We do it like a regular archive:
$ tar -x -f compressed.tar.gz
$ ls -allh
-rw-r--r-- 1 user users 31K Mar 20 23:34 compressed.tar.gz
-rw-r--r-- 1 user users 10M Mar 18 22:16 large1
-rw-r--r-- 1 user users 20M Mar 18 22:19 large2
The tar command uses a special metadata byte in the archive that describes the compression algorithm.
Therefore, we don’t need to specify the corresponding decompression option.
Unfortunately, a downside is that we can’t modify compressed archives in any way.
5.2. *.tgz and *.tar.gz File Extensions
We’ve learned that we can use the -z option of the tar command to compress a tarball. Usually, we’ll name the archived file with the .tar.gz extension. However, sometimes we can see some *.tgz files*.*
*Both the .tgz and .tar.gz extensions tell us a file is a gzip-compressed tarball. They can be extracted using the same tar command. Therefore, there is no difference between the two extensions.*
We may ask, well, if the two extensions mean the same thing, why do we have two extensions? Let’s explain it shortly.
If we look back to the old days of DOS, we’ll notice that the FAT filesystem only supports the 8.3 filename scheme. That is, a file named foo.tar.gz cannot be recognized in the old versions of DOS and Windows systems. Therefore, we have a shorter file extension “tgz” to represent the double extensions “tar.gz” to make the gzip-compressed tarballs recognized in the FAT filesystem.
Today, all modern filesystems support double file extensions. Therefore, we usually take “tar.gz” as the extension for gzip-compressed tarballs to have more verbose information.
Similarly, we have the “tar.bz2” extension for *bzip2*-compressed tarballs.
5.3. Verbosity
We can enable a much more verbose output for all of the operations using the -v option.
Let’s take a look at a compressed archive:
$ tar -t -v -f compressed.tar.gz
-rw-r--r-- user/users 10485760 2020-03-18 22:16 large1
-rw-r--r-- user/users 20971520 2020-03-18 22:19 large2
This gives us many more details about the contents of the archive.
Let’s go back a bit to our update example:
$ tar -t -v -f archive.tar
-rw-r--r-- user/users 10240 2020-03-21 18:55 file1
-rw-r--r-- user/users 12 2020-03-18 20:54 file2
-rw-r--r-- user/users 40 2020-03-22 19:46 file3
-rw-r--r-- user/users 48 2020-03-22 22:18 file3
We can now see the modification date and the size of the files.
This option often comes in handy when we have folders inside an archive.
5.4. Multi-Volume Archives
In the age of limited size portable disks, we used to split archives into multiple volumes.
We can instruct tar to chunk our archive in volumes of maximum Nx1024 bytes or even specify a custom size.
Let’s create a multi-volume archive with a maximum volume size of 8 MB:
$ tar -c -M -L 8M -f archive1.tar large1 large2
Prepare volume #2 for ‘archive1.tar’ and hit return: n archive2.tar
Prepare volume #3 for ‘archive2.tar’ and hit return: n archive3.tar
Prepare volume #4 for ‘archive3.tar’ and hit return: n archive4.tar
$ ls -allh
-rw-r--r-- 1 user users 8.1M Mar 18 23:07 archive1.tar
-rw-r--r-- 1 user users 8.1M Mar 18 23:07 archive2.tar
-rw-r--r-- 1 user users 8.1M Mar 18 23:07 archive3.tar
-rw-r--r-- 1 user users 6.0M Mar 18 23:07 archive4.tar
-rw-r--r-- 1 user users 10M Mar 18 22:16 large1
-rw-r--r-- 1 user users 20M Mar 18 22:19 large2
We used the -M flag to indicate that we want a multi-volume archive, and then we specified the volume size with the -L option.
As a result, tar prompts us interactively to name each volume. We do it by specifying the n option and the corresponding name.
One important aspect to remember here is that we can’t compress multi-volume archives.
5.5. File Selection
When we operate on a large set of files, most of the time, we’d like a more efficient way to include or exclude individual files.
Luckily we can use the -T option to specify our input from a file-based list.
Let’s first take a look at our list of files:
$ cat input-files
file1
file2
file3
folder/file4
We just need to make sure to put the files on separate lines.
Now let’s create an archive and take a peek inside:
$ tar -c -f archive.tar -T input-files
$ tar -t -f archive.tar
file1
file2
file3
folder/file4
We can also use the list option when extracting the archive.
6. Conclusion
In this tutorial, we learned to use the tar tool. In the first part, we took a look at the most commonly used operations: create, read, and extract.
After that, we explored more advanced operations. Finally, we played around with some of the most interesting options.