1. Overview
We create BZ2 archive files using a file compression program called bzip2. The program incorporates the Burrows-Wheeler algorithm or Run-Length Encoding for high levels of compression. The .bz2 file format only supports the compression of single files and doesn’t support the compression of multiple files.
In this tutorial, we’ll be covering different methods of decompressing .bz2 archive files.
2. Using bzip2
bzip2 is an open-source compression tool that compresses files using the Burrows-Wheeler block sorting text compression algorithm and Huffman coding. It takes in a list of file names as input and replaces each file with its compressed version. The compressed files have a .bz2 extension.
The modification date, permissions, and ownership of the original files are preserved so they can be restored while decompressing.
The bzip2 utility is available by default in most Linux distros:
$ which bzip2
/usr/bin/bzip2
In case it’s not available, we can install the bzip2 package using the package manager on our system.
The bzip2 command-line utility has this basic syntax:
$ bzip2 [options] filenames
Assuming we had an archive file called sample.bz2, let’s find out how we can decompress it:
$ bzip2 -d sample.bz2
We’re using the -d or –decompress option to decompress the sample.bz2 file. However, this method replaces the original archive file.
We can modify the command above and use the -k or –keep option to retain the original archive:
$ bzip2 -dk sample.bz2
We can also use the -v or –verbose option to show the compressed ratio of each file processed:
$ bzip2 -dvk sample.bz2
However, bzip2 only works on a single thread, which takes a lot of time to decompress large files.
In the next section, we’ll explore a tool that decompresses .bz2 archive files using multiple threads.
3. Using lbzip2
lbzip2 is a counterpart to the bzip2 utility tool. It uses multiple threads, which results in better performance than the bzip2 tool while compressing or decompressing .bz2 archive files. It also employs the Burrows-Wheeler block-sorting text compression algorithm.
The lbzip2 tool isn’t available in most Linux distros:
$ which lbzip2
If we get a blank output after running the command above, we can install the lbzip2 package using the package manager on our system.
The lbzip2 command-line utility has this basic syntax:
$ lbzip2 [options] filenames
Most of the commands and options are similar to its counterpart, the bzip2 utility.
Assuming we had an archive file called sample.bz2, let’s see how we can decompress it:
$ lbzip2 -d sample.bz2
We’re using the -d or –decompress option to decompress the sample.bz2 file, but the original archive is replaced.
Let’s modify the command above to retain the original archive after decompressing:
$ lbzip2 -dk sample.bz2
We’re using the -k or –keep option to retain the original archive.
We can also decompress more than one .bz2 archive file by specifying more archive files after the first input file:
$ lbzip2 -d sample.bz2 sample2.bz2 sample3.bz2
The lbzip2 tool doesn’t print any output after successfully decompressing an archive file. We can view more details about the decompression process using the -v or –verbose option:
$ lbzip2 -v sample.bz2
It also allows us to write the output to stdout using the -c or –stdout option.
This allows us to view the contents of the output file from the terminal:
$ lbzip2 -dc sample.bz2
This is a sample file
Finally, we can use the -n option to specify the number of threads used while compressing or decompressing. The number should always be a positive integer:
$ lbzip2 -dn 5 sample.bz2
To improve performance, we can specify a higher number to increase the number of threads used.
4. Using pbzip2
pbzip2 is another parallel implementation of the bzip2 utility, which uses pthreads. It’s commonly used on shared memory machines. It offers a near-linear speed up when used on true multi-processor machines and 5% – 10% on Hyperthreaded machines. Output files from the pbzip2 utility are fully compatible with the regular bzip2 utility.
However, compressing files using pbzip2 will break them into chunks, and each chunk is compressed. This speeds up compression and decompression times since the chunks are processed simultaneously.
Decompressing files compressed using the bzip2 utility won’t experience a speedup. This is because the bzip2 utility packages the data into a single chunk that can’t be split between processors.
The pbzip2 tool isn’t available in most Linux distros:
$ which pbzip2
If we get a blank output after running the command above, we can install the pbzip2 package using the package manager on our system.
The pbzip2 command-line utility has this basic syntax:
$ pbzip2 [options] filenames
Most of the commands and options are similar to the bzip2 utility.
Let’s assume we have an archive file called sample.bz2 and decompress it:
$ pbzip2 -d sample.bz2
We’re using the -d or –decompress option to decompress the sample.bz2 file. However, the original archive is replaced.
We can use the -c or –stdout option to view the contents of the output file from the terminal:
$ pbzip2 -dc sample.bz2
This is a sample file
By default, the pbzip2 utility autodetects the number of processors available and allocates the necessary number of processors to use. But we can use the -p option to specify the exact number of processors to use:
$ pbzip2 -dc -p 3 sample.bz2
This is a sample file
We’re specifying the use of 3 processors to decompress the sample.bz2 file.
5. Conclusion
In this article, we’ve looked at three different tools we use to decompress .bz2 archive files. bzip2 is the most common tool for compressing and decompressing .bz2 archive files. However, it uses a single thread, making it slower while processing large files.
lbzip2 is similar to the bzip2 utility but uses multiple threads, resulting in better performance. The pbzip2 utility is also similar to the bzip2 utility, but it uses pthreads, and it’s commonly used on shared memory machines.