1. Overview
A filesystem journal is a data structure where the system logs data changes before applying them to the storage device. Its purpose is to prevent the corruption of the filesystem after a failure.
In this tutorial, we’ll learn about the journal data modes of an ext4 filesystem.
2. The Filesystem Journal Mechanism
Notably, a filesystem write operation consists of many steps. So, if a hardware failure occurs during the write operation, the system may fail to carry out all necessary steps. As a result, the write operation will be incomplete, and the filesystem may get corrupted.
The traditional approach to solving this problem is to run a filesystem repair tool like fsck:
$ sudo fsck /dev/loop5
fsck from util-linux 2.37.2
e2fsck 1.46.5 (30-Dec-2021)
/dev/loop5: clean, 12/131072 files, 151157/524288 blocks
In this example, we can see the output of the fsck command checking the /dev/loop5 loop device.
Repair tools examine all the data structures of the filesystem and attempt to fix any inconsistency they find. Nevertheless, this operation is time-consuming and depends on the filesystem’s size.
The use of a journal can improve this situation. A journal is an area in the storage medium where the system logs data changes. Notably, logging a data change to the journal is much faster in comparison to applying it to the filesystem, so the overhead isn’t significant.
A hardware failure will interrupt the operations running in the filesystem, thus leaving them unfinished. As a result, the system can recover by finding the pending steps of an operation from the journal and applying them to the filesystem.
3. The ext Journal Data Modes
As of version 3, the ext filesystem supports journaling. Currently, there are three journal data modes:
- journal
- ordered
- writeback
We can set the data mode of the journal via the filesystem options of the mount command. To see these options, let’s first create an ext4 filesystem in a file:
$ truncate --size=2G myfs.img
$ sudo mkfs.ext4 myfs.img
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 524288 4k blocks and 131072 inodes
Filesystem UUID: e90b9cfb-a48f-459a-a6e0-8453677c8649
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
Here, we initially created a 2GB file named myfs.img with the truncate command. Then, we formatted it with an ext4 filesystem via the mkfs.ext4 command.
Next, let’s mount the filesystem with the journal data mode:
$ mkdir myfs
$ sudo mount myfs.img myfs -o data=journal
First, we created a new directory named myfs to serve as a mount point. Then, we mounted the myfs.img filesystem to the myfs mount point and set the data=journal mount option through the -o switch.
As a result, we now have a journaling filesystem within the file.
4. The journal Data Mode
The journal data mode logs both data and metadata to the journal. Metadata refers to information like the file name, size, physical location, and others. The data are the actual bytes that we want to store in the drive.
The journal mode provides the best data integrity, but it performs worse than the other two data modes. The ordered and the writeback modes perform slightly better because they keep only the metadata in the journal. The benefit of using the journal mode is that after a system crash, missing data blocks can be obtained from the journal.
Let’s perform a large file copy in the myfs filesystem that we mounted in the previous section using the journal data mode:
$ sudo dd if=/dev/random of=myfs/bigfile count=1000000 status=progress
466577920 bytes (467 MB, 445 MiB) copied, 8 s, 58.3 MB/s
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 8.8282 s, 58.0 MB/s
In this example, *we copied 512MB within 8.82 seconds, using the dd command*.
Furthermore, the journal data mode implicitly deactivates two other filesystem options:
- delayed allocation of data blocks
- direct mode
Delayed allocation of blocks is disabled because data blocks are allocated when data is logged in the journal. In addition, disabling direct mode means the system can’t bypass the page cache.
5. The ordered Data Mode
The ordered data mode logs only the metadata in the journal. Data isn’t journaled at all. Specifically, when we set the ordered data mode, the system performs several actions:
- writes data to its destination blocks in the filesystem
- logs metadata into the journal
- later on, when the system decides, it transfers metadata to the filesystem
The benefit of this mode is that the system can handle a failure between the last two steps. In such a case, data will be in place while the system will replay the metadata from the journal.
On the other hand, if there’s a failure between steps 1 and 2, the filesystem will have inconsistencies unless the write was performed in pre-existing regions of a file.
Furthermore, the ordered data mode performs better than the journal data mode, because it doesn’t record data in the journal. Finally, the ordered mode is the default value of the data filesystem option.
To understand the use of this mode, let’s mount myfs.img again using the ordered data mode:
$ sudo umount myfs
$ sudo mount myfs.img myfs -o data=ordered
Next, let’s copy the same amount of bytes as in the previous section:
$ sudo dd if=/dev/random count=1000000 of=myfs/bigfile status=progress
426623488 bytes (427 MB, 407 MiB) copied, 7 s, 60.9 MB/s
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 8.00293 s, 64.0 MB/s
In this example, we can observe that we copied 512MB in about 8 seconds*, faster than the 8.82* seconds of the journal data mode.
6. The writeback Data Mode
Similarly to the ordered mode, the writeback mode logs only the metadata in the journal. However, this mode writes data to the filesystem regardless of logging metadata to the journal. In other words, the system may write the data to the filesystem either before or after it journals the metadata.
Letting the system decide when to write the data to the filesystem, makes this mode the most performant among the three. On the other hand, it’s the least safe regarding data consistency.
Let’s mount our filesystem using the writeback data mode and do the copy:
$ sudo mount myfs.img myfs -o data=writeback
$ sudo dd if=/dev/random count=1000000 of=myfs/bigfile status=progress
438554112 bytes (439 MB, 418 MiB) copied, 7 s, 62.7 MB/s
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 7.9161 s, 64.7 MB/s
Here, we mounted the filesystem using the writeback option and performed a copy of 512MB. The system copied the file in 7.91 seconds. Compared to the previous examples, the writeback option was faster.
7. Conclusion
In this article, we’ve learned about the journal, the ordered, and the writeback data modes of the ext filesystems.
In summary, the journal mode is the safest but the least performant. On the other hand, the writeback mode offers the best performance but provides the lowest level of data protection. Ultimately, ordered, the default mode, falls in the middle in performance and safety. Finally, we verified our results about the three data modes by copying a large file to a sample ext filesystem.