1. Overview
In this tutorial, we’ll learn about software packages in general. Then, we’ll learn about some common Linux package formats and their capability. Finally, we’ll look at the difference between these and other operating systems’ package formats.
2. What Are Software Packages?
Software packages, or packages for short, is a collection of files and metadata that contains a specific software application or program. It is designed to simplify the process of distributing, installing, and managing software on a computer system.
2.1. Distributing Software Without Packages
One straightforward way to distribute software is to publish the software’s executable on hosting sites, allowing users to download and run it locally. The simplicity of this approach works well for software that’s self-contained and doesn’t require the end user to perform some manual steps.
However, the installation, upgrade, and removal of the software we’re interacting with on a daily basis are not as straightforward. Specifically, software usually requires dependencies of a certain version to be present in the system in order to work well. Furthermore, in Linux, installing new long-running background services requires a separate system account for isolation purposes.
Technically, the software author can write instructions guiding the end-users to do some manual steps for the installation. However, these laborious steps can be easily automated away to further reduce the chance of human mistakes.
2.2. Distributing Software With Packages
On the other hand, software author can distribute their software as a package. By distributing software in packages, the software author can better ensure that the installation sequence is correct without relying on users to follow instructions.
Using software packages, the chance of human mistakes during installation can be greatly reduced, and it simplifies the process of installing complicated software. From the end-user perspective, instead of downloading the binary and following a series of steps, installation of a software package is as simple as running a simple command:
$ apt-get install my-software
Furthermore, the package manager can keep track of all the software on the system. This makes it easy for the user to upgrade the software in the future. Finally, removing the software can be done cleanly without leaving behind dangling dependencies.
In recent times, most software with a sizable user base will distribute their software through packages due to the benefits as described. Let’s look at the different software packages in Linux.
3. Linux Software Package, Package Manager, and Frontends
In Linux, software packages come in the form of package files. The most popular package file formats in Linux include the deb and rpm package file formats. The deb package is the software package file for Debian-based Linux. On the other hand, the rpm package is specific for Red Hat Linux and its derivatives, such as Rocky Linux.
Then, package managers such as dpkg and rpm read the package files and run the steps to install the software package. Furthermore, front-end programs such as apt and yum enhance the package managers and offer the end-user a simpler interface to install software packages. Let’s look at the different components in detail.
3.1. Content of the Package File
The package files usually come in an archive and within the archive, it will contain two important items. Firstly, the package file consists of the software binary or code that’s to be installed. Besides that, the package file contains one or more metadata files that specify different information about the software packages. These metadata of the software packages include the dependencies and their version, the package information, and the checksum for integrity checking.
For example, the deb file is an ar archive and it consists of one text file, debian-binary, and two different archives, the control archive, control.tar, and finally, the data archive, data.tar. The debian-binary is a one-line text file that tells the package format version. Then, the data.tar is the software binary or code to be installed.
Furthermore, the control.tar contains several files, but the most important of them all is the control file. The control file specifies the dependencies of the software which allow the package managers to first verify that information before proceeding with the installation.
On the other hand, the rpm package file comes in binary format and it contains four different sections. Firstly, the lead section contains bytes that the rpm package manager can use to identify whether a file is an rpm package file or not. Then, the signature section contains data that the package manager can use for verifying the integrity of the package file, similar to the deb package md5sums.
In the third section, the rpm package file is known as the header section. This section contains metadata of the package file, such as package name, version, file list, and dependencies. Finally, the last section contains bytes that make up the file archive of the software to be installed.
3.2. The Package Manager
The package manager is a program that read the package file, verifies dependencies, and then runs the necessary steps to install the software. In Linux, the package managers are specific to the distro of Linux. Specifically, Debian-based Linux’s package manager is the dpkg program. The dpkg package manager in turn can only read the deb package file format. Therefore, the package file format our system can use highly depends on the distro of our system.
On the other hand, the rpm package manager is the dpkg equivalent in Red Hat Linux and its derivates.
To install software packages, we simply pass the package file to the package managers. For example, we can install the PostgreSQL software by running dpkg on the package file:
$ dpkg --install postgresql_15+248_all.deb
This way, the installation is simplified because as the end-user, we don’t need to manually run the installation steps in the correct sequence in order to install the software. Additionally, the package manager will verify that our system meets the dependencies requirements of the software, preventing a faulty installation.
3.3. The Package Manager Front Ends
Most of the time, we don’t use package managers to install our software packages. Instead, we use the frontends of the package managers, such as apt in Debian-based Linux and yum in Red Hat-based Linux. This is because these package manager front ends enhances the package manager by offering several improvements for the installation process.
Firstly, frontends such as apt and yum can connect to a remote repository to download package files by package name. In contrast, the dpkg and rpm commands don’t fetch the package file from remote hosts and instead expect them to be local before they can run it.
Furthermore, package managers like dpkg and rpm only verify the dependencies and don’t automatically install them. However, frontends like apt and yum can automatically resolve the dependencies and then install them when they detect that the system is lacking the required dependencies.
For example, to install the PostgreSQL program in Debian Linux, we can run the apt-get install command and specify just the name of the package:
$ apt-get install -y postgresql
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
cron libllvm10 libpq5 locales logrotate postgresql-12 postgresql-client-12 postgresql-client-common postgresql-common ssl-cert sysstat
...
Notice that the apt-get install command also automatically resolves and installs additional packages. These packages are the dependencies of the postgresql package.
4. Differences With Windows
Similar to Linux, most other operating systems have a similar mechanism for managing software. Let’s look at the software package files in the Windows operating system.
One common way to distribute software on the Windows operating system is to package the software into a msi file. Similar to the deb and rpm package files, the msi file contains the software binary and metadata necessary for the installation.
However, unlike the package files we’ve seen thus far in Linux, the msi package file is usually self-sufficient. In other words, msi files usually contain everything the software need in order to run. Furthermore, *the *ms**i package comes with a fancy GUI-based guided wizard that allows the user to configure the installation**. For example, users can configure the installation path through the wizard.
The closer equivalent of the Linux package files on the Windows operating system is the Chocolatey package. Similar to the package file, the Chocolatey package contains a PowerShell script that automates the installation process. Besides that, there’s a command-line tool, choco that manages the software installed on the system, very much like the package manager frontends, such as apt and yum.
5. Conclusion
In this tutorial, we’ve learned how software packages can make software installation and management easier. Then, we looked at the different components of software packages, such as package file, package manager, and package manager frontends. Finally, we do a comparison of the software package formats against the msi format that’s popular in the Windows operating system.