1. Overview
In this tutorial, we’ll talk about two filesystems, namely ZFS and XFS. First, we’ll go through their definition and history. Then, we’ll look at their main shared and exclusive characteristics. Finally, we’ll discuss their pros and cons, focusing on the guidelines on when to use which.
2. Filesystems: XFS and ZFS
A filesystem (usually abbreviated with FS) is the way and structure in which the bytes of data are ordered in our storage devices. Filesystems provide folders, access control, and named files to the operating systems.
There are many filesystems but the two that we’ll discuss in this article are ZFS and XFS.
2.1. History of XFS
Silicon Graphics developed the XFS earlier than ZFS was created – the first release of XFS was in 1994. The name, as with most filesystem names, has two parts: X-FS, where FS refers to filesystem. They were supposed to replace the X later, but they never did.
XFS was developed for IRIS Unix OS with one idea in mind: performance in accessing huge files with many CPUs while also avoiding file corruption. As we’ll see, these objectives conditioned how the filesystem works and its features. XFS is available in the Linux kernel since 2001.
2.2. History of ZFS
Sun Microsystems first released ZFS in 2006, only for OpenSolaris. However, it was eventually ported to Linux through tools.
It cannot be provided in the Linux kernel due to licensing incompatibilities. In fact, when Oracle acquired Sun Microsystems, they closed the source code of ZFS. This led to the creation of OpenZFS, which has diverged from the ZFS developed by Oracle. In this article, when talking about ZFS, we refer to the OpenZFS branch.
ZFS stands for Zettabyte File System. A zettabyte is 1012 gigabytes – which stresses one of the main features of the filesystem, as the creators wanted a filesystem capable of handling the data in the zettabyte era.
ZFS is an advanced filesystem and many of its features focus mainly on reliability. ZFS brings robustness and stability, while it avoids the corruption of large files.
3. Features of the XFS and ZFS
Let’s go through the different features of the two filesystems. Even if they don’t share many features, we’ll start with them. Then, we’ll look at the ones that are key differentiators of each one.
3.1. Shared Features of Both Types of Filesystems
Both filesystems offer some limited protection against kernel panic and power outages when writing into disks. However, the way in which they achieve this is different for each filesystem.
XFS is a filesystem that uses journaling. The filesystem keeps track in a journal of the changes that haven’t yet reached the disk. If there’s a power failure, there are lower possibilities of becoming corrupted.
ZFS works on the copy-on-write (COW) principle. When the system reads a file, the file is not copied until the program using the file changes it. When the program wants to write, it copies the file (instead of overwriting it) and, once the writing operation is over, the filesystem deletes the original data. This also ensures data protection against kernel or power problems.
3.2. Differentiator Characteristics of XFS
There are some features that make XFS an interesting filesystem. One of the most relevant features of XFS is its allocation groups. They are subdivisions of physical volumes that keep track of the free blocks and the data they hold.
Moreover, XFS uses B+ trees to handle the indexing, with an approach to have a reduced number of blocks listed, thus improving performance for data retrieval.
As mentioned earlier, XFS objective was to offer high performance in terms of I/O. XFS achieves this by providing direct memory access (usually known as DMA). XFS uses DMA to transfer the data between the application and the disk, allowing access to the full bandwidth allowed by the physical device.
Moreover, due to its focus on I/O operations, XFS also implements a guaranteed-rate I/O system. A given application can reserve the bandwidth and XFS ensures that these reservations are respected. This feature, exclusive to XFS, is useful for real-time applications or data-intensive applications.
3.3. Differentiator Characteristics of ZFS
We’ve to start by understanding that ZFS is not only a filesystem but so much more. ZFS provides the filesystem structure but also the volume manager (like LVM). ZFS contains everything that LVM does, plus some extra tools that we might find useful. Bundling the filesystem and the volume manager together gives comprehensive knowledge to the system of the status, condition, and volumes of the physical disks.
ZFS also provides RAID (which stands for Redundant Array of Independent Disks). There are specific layouts designed for ZFS, which are known as RAID-Z. They improve efficiency by striping data only when required and not indiscriminately. RAID offers data protection, that’s why applications where reliability matters use it extensively, like servers. Moreover, ZFS supports many built-in compression algorithms so that we can store files compressed – optimizing the disk space.
Moreover, ZFS has a snapshot function. We can schedule snapshots of our data without degradation on performance and without a penalty on the space used. This further protects against data loss that can occur due to errors or malicious activity. We can roll back to any previous system snapshot if desired, but ZFS might also automatically roll back if it detects inconsistencies.
Finally, although not least important, ZFS implements Cyclic Redundancy Check (CRC) on a block level. This feature avoids silent data corruption by detecting it and preventing it. This is useful for errors of misprocessing, among others. The majority of filesystems assume that when a write operation has been completed the data is on the disk. This might not be the case, and ZFS checks the resulting file with CRC.
4. Comparison Between Both Filesystems
On the one hand, XFS with its allocation groups allows large scalability: multiple CPUs can access data at the same time with huge filesystem bandwidths. Files can span different physical storage devices. In combination with the B+ tree implementation, we should be able to retrieve and find data faster than ZFS.
Moreover, from a standard installation without tweaking the parameters, XFS performs slightly better than ZFS (in I/O operations) and uses fewer resources.
On the other hand, ZFS offers protection against disk problems with features such as RAID-Z or CRC. We can certainly configure both RAID and CRC with an XFS filesystem, but having them seamlessly integrated into ZFS eases the process.
Being a copy-on-write filesystem, ZFS also avoids problems that can occur during erroneous journal recovery. These recovery problems may lead to data loss in journal filesystems like XFS.
Finally, even if XFS can also create snapshots as ZFS does, the performance of the snapshots is considerably better in ZFS. Moreover, the process of getting snapshots in ZFS is more straightforward, automatic, and reliable.
5. When to Choose Xfs Over Zfs and Vice Versa
As we’ve seen, each filesystem provides different features that we can use in specific cases. In any case, with both ZFS and XFS we’ve got an adequate filesystem for most scenarios. But, there are some specific cases where one filesystem will perform better or give more functionality.
On the one hand, XFS is a filesystem optimized for very large files, with optimized I/O operations for parallel operations. Thus, we should use XFS if we need to handle very large files fast with multiple I/O operations in our application. This is the case in many supercomputers, such as those in the NASA Advanced Supercomputing Division, that use XFS.
However, the extra performance of ZFS has a toll on the filesystem complexity for the administrator of the infrastructure. ZFS is a more involved filesystem, and it requires tuning to fit our specific application and get the maximum performance. We’ll need to spend time benchmarking different parameters such as the block size or the record size. Moreover, we need to install ZFS in our distribution with a binary file and a kernel module, as it is not directly provided with the Linux kernel. We need to do this process for the installation but we also need to repeat this process for each upgrade.