1. Introduction
The ls command on many UNIX systems is one of the most frequently used commands. It’s basic yet powerful, offering many options to list files and directories, revealing detailed information about them. However, several users might get confused while trying to understand the total output of the ls -l command.
For newcomers or even seasoned users, that line often raises eyebrows:
$ ls -l
total 12
-rw-rw-r-- 1 root root 18 May 28 19:48 text_file_1.txt
-rw-rw-r-- 1 root root 43 May 28 19:49 text_file_2.txt
-rw-rw-r-- 1 root root 57 May 28 19:49 text_file_3.txt
We must admit that, without proper knowledge, the total line in the output can indeed be a source of confusion. In this article, we aim to demystify it.
Throughout this guide, we’ll use Ubuntu 20.04.4 (Focal Fossa). However, the commands and concepts discussed here should work on most modern Linux distributions as long as they are applied on EXT4 file systems with the default configuration.
2. Understanding the total Number
The number following total is a representation of the cumulative disk space, in blocks, allocated for the files being displayed. It’s not just a straightforward sum of file sizes but a reflection of how file systems handle space and how ls calculates it.
In fact, the formula used by ls is the key to calculating the total value. However, to fully grasp it, we must also delve into some of the principles of file storage and the concept of blocks in file systems.
2.1. Calculation of total
To calculate the total number, ls sums the product of the physical blocks in use and the ratio between the physical block size and the ls block size for each file. If the sum doesn’t result in an integer, it rounds it up to the next integer (ceiling operation).
The formula for calculating the total value is the following:
total_int = ceil[
(file1_physical_blocks_in_use * file1_physical_block_size / file1_ls_block_size) +
(file2_physical_blocks_in_use * file2_physical_block_size / file2_ls_block_size) +
... +
(fileN_physical_blocks_in_use * fileN_physical_block_size / fileN_ls_block_size)
]
Let’s break down the formula to understand the total output of ls better.
2.2. The ls Block Size
The ls block size is a variable that defines how the command divides files into blocks to display their file size. It can be modified with the –block-size flag (or LS_BLOCK_SIZE variable), the -k flag (to force 1 KB units), or the POSIXLY_CORRECT GNU environment variable (to get 512-byte units if neither COMMAND_BLOCK_SIZE, BLOCK_SIZE, nor BLOCKSIZE is set).
If none of the above variables are set, then ls will most likely use a block size of 1024 bytes.
To be clear, the ls block size only influences the reporting of file sizes. The filesystem handles the file division into blocks beyond the ls command’s control.
We should consider that nothing prevents us from setting the LS_BLOCK_SIZE or the POSIXLY_CORRECT environment variables to change the block size used by ls. However, this isn’t recommended, as it can lead to unexpected behavior in other programs that rely on the default block size (inconsistencies in reported file sizes, data transfer issues, problems with scripts and automation, and so on).
Instead, we could leverage the ls command’s useful options to change the block size. For instance, we can use the -s option to display the size of each file in blocks. That can help us better understand how the total value is calculated.
2.3. The Physical Block Size
The physical block size signifies the size of each block used in file storage within the operating system’s internal block interface. It represents the allocated size for data storage and is independent of the underlying hardware configuration.
This value is usually 512 or 1024 bytes, depending on the OS. We can retrieve it by looking at the %B value on the outputs of the stat or fstat commands. We should note that this value is almost always unrelated to the actual number of physical blocks on a modern storage device.
For instance, we can check the physical block sizes of multiple files using the stat command:
$ stat * --printf="%b\t(%B)\t%n: %s bytes\n"
8 (512) text_file_1.txt: 18 bytes
8 (512) text_file_2.txt: 43 bytes
8 (512) text_file_3.txt: 57 bytes
The above example shows that the physical block size is 512 bytes for all files.
2.4. The Physical Blocks in Use
The number of blocks allocated to a file represents its physical blocks in use. The system generally calculates this value by dividing the file size by the physical block size and rounding up to the next integer.
That means if a file is 1025 bytes, it should occupy three blocks (1025/512 = 2.00195, rounded up to 3) with a physical block size of 512 bytes, right? Well, not exactly.
From the stat command’s output, we can see that each file of a few bytes occupies 8 blocks of 512 bytes each. This is because the minimum allocation size for a file is 4 KB (8 blocks of 512 bytes each). That’s the default value for the actual block size on EXT4 file systems. This means that even if a file is 1 byte, it will still occupy 4 KB of space.
If we want to check the actual block size used by the filesystem, we can use the tune2fs command:
$ tune2fs -l /dev/sda1 | grep 'Block size'
Block size: 4096
Knowing this value is certainly useful, but it doesn’t directly help us understand how ls calculates the total output.
2.5. Simple Real-World Examples
Let’s look at some examples to understand the total value. We’ll use the following files:
$ ls -l
total 12
-rw-rw-r-- 1 root root 18 May 28 19:48 text_file_1.txt
-rw-rw-r-- 1 root root 43 May 28 19:49 text_file_2.txt
-rw-rw-r-- 1 root root 57 May 28 19:49 text_file_3.txt
Let’s now use the -s option to display the size of each file in blocks:
$ ls -ls
total 12
4 -rw-rw-r-- 1 root root 18 May 28 19:48 text_file_1.txt
4 -rw-rw-r-- 1 root root 43 May 28 19:49 text_file_2.txt
4 -rw-rw-r-- 1 root root 57 May 28 19:49 text_file_3.txt
Now, we can calculate the total value using the formula we saw earlier. Due to the lack of support for the ceiling division in Bash, we’ll be using the pre-installed Python 3 interpreter to perform the calculations.
$ python3 -c "import math; print(math.ceil((8 * 512/1024) + (8 * 512/1024) + (8 * 512/1024)))"
12
Previously, we saw that each file is 8 physical blocks. This is because the physical block size on our system is 512 bytes, the minimum allocation size is 4 KB (8 x 512 B blocks), and each file is less than that in size.
Let’s now use the –block-size option to specify a custom ls block size of 512 bytes:
$ ls -l --block-size=512
total 24
8 -rw-rw-r-- 1 root root 1 May 28 19:48 text_file_1.txt
8 -rw-rw-r-- 1 root root 1 May 28 19:49 text_file_2.txt
8 -rw-rw-r-- 1 root root 1 May 28 19:49 text_file_3.txt
$ python3 -c "import math; print(math.ceil((8 * 512/512) + (8 * 512/512) + (8 * 512/512)))"
24
With the ls block size matching the physical block size, we can see that the total value is now 24. In fact, this makes it more straightforward to understand how the system calculates it.
3. The Discrepancy Between Tools
The ls, stat, and du commands use different methods to estimate the number of blocks each file uses. This might lead to discrepancies in their output, especially on several non-standard configurations.
As we saw in this article, the ls command employs a unique formula to calculate the total block usage. Sometimes, ls can overestimate the number of blocks due to rounding up.
On the other hand, stat consistently reports the number of 512-byte blocks allocated for a file, irrespective of the actual filesystem block size.
Lastly, du calculates file space usage based on the total block count. That includes filesystem metadata. Additionally, it treats sparse files differently by only counting the blocks written.
Awareness of these discrepancies is crucial when managing file system usage and file sizes on Linux.
4. Why Is the Total Value Important?
The total line in the ls -l command’s output offers a quick and alternative way to estimate the drive space used by files and directories. This information might be useful for managing storage-related issues, especially on systems with limited storage.
Although this number might seem abstract and detached from a physical or meaningful metric, it represents an essential aspect of storage management. When we understand the total output of the ls command, we gain insights into the underlying mechanics of the file system and the intricacies of space allocation.
5. Conclusion
The ls command is a powerful tool for managing files and directories in a Linux environment. Although the total output can sometimes be confusing, with a bit of practice and understanding, it becomes an invaluable resource. The total line is just one aspect of the ls -l output that carries valuable information. By understanding the nuances of this command’s output, we can effectively manage files and directories and use our system to its full potential.