1. Overview
In this tutorial, we’ll talk about what metadata is. After that, we’ll look at how we can display metadata in Linux. Finally, we’ll learn how it aids in the fsck process.
2. What Is Metadata?
Metadata summarizes basic information about data, making it easier to find, use and reuse or recreate that particular instance of data.
It can be information about data or objects, including images, sounds, databases, and computer files. For example, if we have a photo, metadata would include the date of creation, subject category, means of creation (e.g., photograph, painting, computer-generated), copyright owner, file size, and file format (e.g., jpeg, gif, png).
General “file metadata” would involve information not stored within the file. The computer file system stores metadata and uses it to locate and manipulate files.
Let’s see some examples of metadata in Linux:
- its file type, e.g., directory, link, data file
- timestamps like the date it was created, last access, and modification
- location on the filesystem
- size (in bytes)
- its physical location (i.e., the addresses of the blocks of storage containing the file’s data on a disk)
- ownership, including User and Group IDs
- access permissions (i.e., read, write, and execute) and file type
2.1. Inode
Metadata is stored in data structures called inodes. Data structures, as we know, allow computers to store information in a manner that the computer can use the information efficiently.
An inode contains all the information about a file except its name. The name and the file’s actual data are stored elsewhere.
The inode stores the metadata associated with a file, such as a file’s access permissions, last access timestamp, owner, group, size, and the location of the file’s data.
We can view the inode number (unique ID) of files using the -i flag to the ls command:
$ ls -ai
1466412 . 1450650 .cache 1450840 ipinfo.txt 1459827 .profile 1454183 test.txt
1441793 .. 1450676 .config 1458522 .lesshst 1450724 Public 1450728 Videos
2.2. Example of Metadata Information in Linux
To view metadata information in our Linux computers, we can use the following commands: stat, file, and exiftool. To use the exiftool, we need to install it through apt:
$ sudo apt install libimage-exiftool-perl
First, let’s run the stat command:
$ stat test.txt
File: test.txt
Size: 434 Blocks: 8 IO Block: 4096 regular file
Device: 803h/2051d Inode: 1454183 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/ kisii) Gid: ( 1000/ kisii)
Access: 2023-03-13 09:30:33.029115278 +0300
Modify: 2023-03-13 09:30:19.985703385 +0300
Change: 2023-03-13 09:30:19.985703385 +0300
Birth: 2023-03-13 09:30:19.985703385 +0300
The stat command shows us the status of a file, e.g., permissions, date of last access and modification.
Secondly, let’s run the file command:
$ file test.txt
test.txt: ASCII text, with very long lines (433)
$ file Downloads/imgtest.jpg
Downloads/imgtest.jpg: JPEG image data, JFIF standard 1.02, resolution (DPI), Exif Standard: [TIFF image data, big-endian, direntries=16, height=670, bps=0, compression=LZW, PhotometricIntepretation=RGB, orientation=upper-left, width=834], baseline, precision 8, 600x482, components 3
When we run the file command it mainly shows us the basic metadata information. Often, it’s used to determine file types.
Lastly, let’s run exiftool:
$ exiftool Downloads/imgtest2.jpg
ExifTool Version Number : 12.40
File Name : imgtest2.jpg
Directory : Downloads
File Size : 81 KiB
File Modification Date/Time : 2023:03:13 09:53:25+03:00
File Access Date/Time : 2023:03:13 09:53:42+03:00
File Inode Change Date/Time : 2023:03:13 09:53:25+03:00
File Permissions : -rw-rw-r--
File Type : JPEG
File Type Extension : jpg
MIME Type : image/jpeg
JFIF Version : 1.01
Resolution Unit : inches
X Resolution : 300
Y Resolution : 300
Exif Byte Order : Little-endian (Intel, II)
Image Description : Abstract background of colorful curved lines
Orientation : Horizontal (normal)
Asset ID : 1198271727
When we run the exiftool, it displays the metadata about the imgtest2.jpg. Further, we can use it to modify and write metadata information.
3. The fsck Utility
A Linux file system can develop inconsistencies in various ways. The typical causes are user errors, hardware failures, or defective hardware.
For example, suppose we shut down a system improperly or a user incorrectly takes a mounted file system offline. The disk controller may stop functioning correctly, or blocks can be damaged on the disk drive, making the system become inconsistent and consequently corrupting the file system.
The fsck command checks for file system consistency, and if there is an error, it repairs it.
By default, the fsck command runs automatically on boot. However, we can run it manually if we need to. Before we run fsck on a live system, we must ensure that the file system we want to check is unmounted.
Let’s run the fsck command:
What Is Metadata and How Does It Aid in the fsck Process?$ sudo fsck /dev/sda4
fsck from util-linux 2.37.2
e2fsck 1.46.5 (30-Dec-2021)
/dev/sda4: clean, 11/128 files, 18/251 blocks
The fsck command shows that /dev/sda4 is clean which means the data is consistent.
4. How Does Metadata Aid in the fsck Process?
When we run the fsck command, it checks for consistencies in the following components: cylinder group blocks, inodes, superblocks, indirect blocks, and data blocks. In this section, we’ll only focus on the inodes.
File system checking involves reading all the inodes and resolving the corrupted files. For instance, let’s assume an inode isn’t on the list of free inodes, but no directory entries indicate that this inode is part of a file in any of the directories the file system knows of. If we run the fsck command, it adds this inode to the list of free inodes.
The fsck checks the list of inodes sequentially when we run it. The command checks for inconsistency in an inode in the following areas: format and type, link count, duplicate, bad block numbers, and lastly, inode size.
4.1. Format and Types
Inodes exist in three states: allocated, unallocated, and partially allocated.
A specific number of inodes are set aside on creating the file system and aren’t allocated unless they are needed. An allocated inode points to a file, while an unallocated inode doesn’t point to a file, and it should be empty. Due to hardware failure, an incorrectly formatted inode is partially allocated when insufficient data is written to the inode list. When we run fsck, it clears the partially written inodes and sets them to a free state.
For example, the inode state changes from free to allocated when we create a file. Concurrently, data and metadata are written to the newly created file and directory file, respectively (the inode is partially allocated because the writing process is not complete yet). During this process, the computer can crash, leading to a corrupted system. We can run fsck to correct the file system, where it will clear the inode and set it back to free.
4.2. Duplicate Block Checks
Each inode contains pointers to lists (indirect blocks) of all the blocks claimed by the inode. Inconsistencies in the indirect blocks directly affect the inode that owns it. The fsck command compares each block number claimed by the inode to a list of allocated blocks. If another inode claims the same block number, the command adds it to a list of duplicate blocks. If not, the fsck updates the list of allocated blocks to include the block number.
4.3. Bad Block Number Checks
The fsck process checks each inode’s block number to see if it is higher than the block number of the first data block and lower than that of the last data block in the file system. If the number is outside this range, it’s a bad block number. Often, a bad block number results from indirect blocks not being written to the file system. When we run fsck, it prompts us to clear the inode.
4.4. Inode Size Checks
Inodes contain a count of the number of data blocks they reference. The aggregate of the allocated data blocks and their indirect blocks gives us the number of actual data blocks. fsck calculates the number of data blocks and compares that number with that of the blocks claimed by the inode. The fsck command prompts us to fix inodes that contain incorrect counts.
Each inode contains a 64-bit size field that shows the number of characters (data bytes) in the file associated with the inode. To check for consistency of an inode’s size field, the system uses the number of characters in the size field to calculate how many blocks should be associated with it. It then compares that figure to the actual number claimed by the inode.
4.5. Synchronous Write-Thru
In synchronous write-through, data is written both to cache and to memory (back end). Its main advantage is that it’s simpler to implement, and the main memory is always consistent with the cache. On the other hand, its disadvantage is that the process is slow since it has to write in two locations.
So, if our system crashes before the synchronous write-thru is complete, it makes the inode partially allocated. We have to run fsck to set the inode free.
5. Conclusion
In this article, we’ve looked at what metadata is and how we can display it in Linux. We’ve also learned about the fsck process and how the metadata aids it.
When we run the fsck command, it performs checks that ensure that the inodes have the right format and type, duplicate blocks are accounted for, bad block numbers are reset and lastly, that the inodes are assigned the right sizes. This guarantees that the data is consistent.