1. Introduction
Since its inception in January 1979, the tar tape archiver tool has expanded much beyond its original designation through the years. In fact, it has branched out from the POSIX standard into two main versions:
- Berkeley Software Distribution (BSD) tar, BSDtar
- GNU’s Not Unix (GNU) tar, GNUtar
While there are versions for embedded systems and custom implementations like star by Jörg Schilling, the ones above are arguably the most robust and widely used.
In this tutorial, we’ll explore how the GNU and BSD tar implementations differ. First, we look at BSD tar and its main branches. After that, we turn to GNU tar. Next, we thoroughly go over star. Then, the different tar formats take the spotlight. Finally, we compare the main tar implementations in terms of many of their important features.
We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4, as well as OpenBSD 7.3 and FreeBSD 13.2. It should work in most POSIX-compliant environments unless otherwise specified.
2. BSD tar
Actually, BSD tar has an original and library version. Let’s briefly look at both.
2.1. Original BSD tar
The original BSD tar is still bundled with OpenBSD:
$ tar .
tar: unknown option .
usage: tar {crtux}[014578befHhjLmNOoPpqsvwXZz]
[blocking-factor | archive | replstr] [-C directory] [-I file]
[file ...]
tar {crtux}[014578befHhjLmNOoPpqsvwXZz] [-b blocking-factor]
[-C directory] [-f archive] [-I file] [-s replstr] [file ...]
In fact, its command name is just tar and it has no default alias or symbolic link like bsdtar. That’s mainly due to the stronghold nature of OpenBSD.
For the same reasons, we don’t really have a way to get the current version of the tool in this case, apart from checking the OS release. Actually, this version of tar is the OpenBSD POSIX-strict implementation, part of the bundled toolkit source code, so finding it on another platform is rare.
2.2. libarchive [bsd]tar
On the other hand, FreeBSD ships with another version of tar, which we’ll identify as [bsd]tar. Also, it’s available as a package in major Linux distributions:
$ which tar
/usr/bin/tar
$ which bsdtar
/usr/bin/bsdtar
$ ls -l /usr/bin/tar
lrwxr-xr-x 1 root wheel 4 Apr 7 08:42 /usr/bin/tar -> bsdtar
$ tar --version
bsdtar 3.6.2 - libarchive 3.6.2 zlib/1.2.13 liblzma/5.4.1 bz2lib/1.0.8 libzstd/1.4.8
In this case, we use which along with ls and its [-l]ong list –-human-readable format to verify the files and locations:
- the main executable is tar
- there is also an alias bsdtar in the form of a symbolic link that points to tar
This [bsd]tar version of tar is based on the libarchive library. Although perhaps not quite up to OpenBSD standards, the libarchive code is very comprehensive and feature-rich.
Because of this, the bsdtar command in recent versions of Ubuntu actually comes from the libarchive-tools package:
$ apt-get install bsdtar
[...]
Package bsdtar is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
libarchive-tools
E: Package 'bsdtar' has no installation candidate
$ apt-get install libarchive-tools
[...]
On the other hand, the tar command represents another utility in Ubuntu and most major distributions.
3. GNU tar
As expected, there is also a GNU tar command, often called regular tar.
For many major Linux distributions, that’s the main tar implementation that either comes preinstalled or in the default tar package:
$ apt-get install tar
Let’s try it out by checking the version we have:
$ tar --version
tar (GNU tar) 1.34
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by John Gilmore and Jay Fenlason.
In fact, GNU tar fully supports all features of the original [bsd]tar implementations but also includes GNU-specific options and additions. Critically, GNU has its own archive format specification due to the limitations of older tar formats. In fact, the default archive format is part of the build options.
4. star
Although it’s not directly branching from either of the main competitors, the star command has its own place among tar archivers due to its many exclusive features. Due to its features and the fact that it fully supports POSIX 1003.1 and POSIX.1-2001, we add it here for completeness.
4.1. Third-Party Syntax Support
The star command integrates several features from other commands:
- built-in find – supports find-like syntax for creating, extracting, and listing, avoiding the need for third-party tools in that regard
- built-in diff – compare filesystem trees
- patterns – specific syntax for subset archiving or extraction, as well as ed-like pattern substitution
For some, even the syntax is similar enough for convenient usage.
4.2. Metadata Handling and Recognition
Support for many aspects and types of metadata is included in star:
- long names – up to 1024 byte names supported
- handles all times – all three file times are available for storage in nanosecond granularity
- ACLs and file flags – has access control list (ACL) and file flag support
- inode metadata – supports all inode metadata
Due to the above, star can handle complex backup features like incremental backups.
4.3. Automation
Unlike other tar implementations, star can handle the archive and compression format detection automatically and decompress where needed. On top of that, any extractions will not replace more recent copies of the same data by default.
4.4. Speed and Performance
Using several features, star manages to outperform most tar implementations:
- -fifo – used to drastically improve performance
- optimized -copy mode – very fast and accurate copies of trees
- error control – ignore specific errors for given files by pattern
Actually, star is the fastest tar archiver so far.
4.5. Remote Backups
The star tool has its own portable rmt server and mt client implementation. Thus, it provides full support for the remote magtape protocol.
In addition, star manages all these features while remaining independent from both the operating system and the filesystem.
Now, let’s explore some differences between the main versions of tar archives.
5. tar Archive Formats
There are five main tar archive formats:
+--------+-----------+-----------+-----------+-----------+
| Format | ID | File Size | File Name | Devn |
+--------+-----------+-----------+-----------+-----------+
| gnu | 1.8e19 | unlimited | unlimited | 63 |
| oldgnu | 1.8e19 | unlimited | unlimited | 63 |
| v7 | 2097151 | 8GB | 99 | - |
| ustar | 2097151 | 8GB | 256 | 21 |
| posix | unlimited | unlimited | unlimited | unlimited |
+--------+-----------+-----------+-----------+-----------+
Let’s briefly explore each one.
5.1. gnu and oldgnu Archive Format
The GNU tar archive is based on an early POSIX variant of the same but adds many improvements. Its older version was used in GNU tar versions before 1.12.
Critically, the GNU tar archive is incompatible with the original POSIX standard it extends due to header and limit changes.
5.2. v7 Archive Format
The original pre-POSIX archive format from UNIX v7 is still relevant despite its many limitations:
- maximum file name length 99 bytes
- maximum symbolic link length 99 bytes
- can’t store special files
- maximum group ID (GID) value 2097151 (7777777 octal)
- no symbolic ownership
In fact, versions of automake prior to 1.9 use v7 to produce Makefiles.
5.3. ustar Archive Format
As the initial POSIX specification from 1988, ustar can store symbolic ownership and special files.
Still, it exhibits most other limitations above:
- maximum file name length 256 bytes, but usually
- maximum symbolic link length 100 bytes
- maximum size of 8GB per in-archive file
- maximum user ID (UID) or group ID value 2097151 (7777777 octal)
- maximum bits in device (usually tape drive) numbers is 21
Still, this format is a classic and may still be around for a while, especially for older historical data.
5.4. star Archive Format
Specific formats exist for some niche tar versions like star.
In fact, star is the fastest tar archival implementation in part due to its special format.
5.5. posix Archive Format
Since the first tar archive specification and the GNU tar format upgrade, POSIX has included a new version of its related standard: POSIX.1-2001.
Simply called posix, the new tar format within has no file size or name length restrictions. While it’s still fairly recent, posix aims to be compatible with ustar. In fact, it overcomes the limitations of the latter by extracting overflowing names and other data as separate files.
Finally, the posix format is currently the GNU tar default.
6. tar Tool Version Differences
The original BSD tar doesn’t really have much going for it apart from strictly adhering to the standard and ensuring stability and security. Thus, original BSD tar is perhaps the most reliable and portable.
Because of this, we take it as the common ground between [bsd]tar and GNUtar. So, let’s drop the original BSD tar from the comparison, so we can size up libarchive [bsd]tar and GNU tar in terms of their custom features and implementations.
6.1. Archive Formats
Format support is perhaps best visualized in a table:
+-----------------------------------+
| Format | GNUtar | [bsd]tar | star |
|--------+--------+----------+------|
| gnu | yes | yes | yes |
| oldgnu | yes | no | yes |
| v7 | yes | yes | yes |
| ustar | yes | yes | yes |
| star | no | no | yes |
| posix | yes | yes | yes |
+-----------------------------------+
In terms of formats, oldgnu* isn’t readable by libarchive *[bsd]tar. Yet, just like GNUtar, the latter can read all other major tar implementations, including gnu and posix, as well as non-tar files like zip, 7zip, ISO 9660, and similar. Meanwhile, star can handle all of the above.
Still, GNUtar and [bsd]tar are comparable in terms of improving on the format support, despite the claims of libarchive to do it better.
6.2. Compression Detection
On the one hand, libarchive [bsd]tar and star can both detect the compression type even when the data is coming from stdin. On the other hand, GNUtar needs additional hints to do the same.
6.3. Sparse Files
Similar to the storage data sectors, sparse files have a lot of empty or unused regions and leverage metadata to represent those regions of their content instead of taking up space with zeroes or random bits.
Sparse file handling is different between GNU and libarchive. However, what we see below for [bsd]tar applies to star.
For example, libarchive [bsd]tar and star use metadata, while GNU tar processes every empty file region. To illustrate, let’s create the 10M sparse file sparse-file with the dd tool:
$ dd of=sparse-file bs=10M seek=1 count=0
Next, we –create three archive [–file]s that contain only that file. One uses GNUtar, another uses star, while the last one employs [bsd]tar:
$ tar --create --file=tar.tar sparse-file
$ star -cf star.tar sparse-file
$ bsdtar --create --file=bsdtar.tar sparse-file
Finally, we can compare the three:
$ ls -lh
total 11M
-rw-r--r-- 1 baeldung baeldung 3.0K May 5 05:00 bsdtar.tar
-rw-r--r-- 1 baeldung baeldung 10M May 5 05:00 sparse-file
-rw-r--r-- 1 baeldung baeldung 3.0K May 5 05:00 star.tar
-rw-r--r-- 1 baeldung baeldung 11M May 5 05:00 tar.tar
Notably, tar.tar is even bigger than the original file, while the sizes of bsdtar.tar and star.tar are close to the minimal format size.
6.4. Incremental Backups
In general, there are three main backup types:
- full – backup everything
- incremental – backup only changed objects
- differential – backup only changes
Of these, the full type is supported by any of the tar solutions by just creating an archive with all the data:
$ tar --create --file full.tar /data/*
However, both GNUtar and star can create an incremental backup. In the case of GNUtar, it’s via the –listen-incremental=
$ tar --create --file=inc-1.tar --listed-incremental=full.nfo /data/*
Snapshot files provide metadata for changes to avoid processing static files.
While differential backups aren’t supported directly, we can use the –compare, –diff or -d flag of GNUtar to compare files on the storage with files in the archive.
None of the above two functions are supported by [bsd]tar.
6.5. Add and Delete Files
Unlike [bsd]tar, GNUtar also supports the –delete flag for removing specific files from an archive without extracting and repacking. Still, the operation recreates the archive:
$ tar --list --file=tar.tar
file1
file2
fileD
file3
$ tar --delete --file=tar.tar fileD
$ tar --list --file=tar.tar
file1
file2
file3
After [–list]ing the files in the tar.tar [–file], we delete fileD and confirm it’s removed. For obvious reasons, –delete has no short form.
On the other hand, we can use the special @archive syntax of [bsd]tar to add files to an existing archive:
$ cat old.tar | tar --create --file=- fileN @tar.tar
In this case, we –create a new archive from the old.tar data coming from stdin and the fileN file.
7. Summary
In this article, we explored several tar implementations along with their differences and similarities.
In conclusion, libarchive [bsd]tar, GNUtar, and perhaps even star are the most feature-rich but have overstepped the POSIX standard as adhered to by the original BSD tar.