1. Overview
Listing files is a common operation when we work with the Linux command line. Usually, we’ll use two commands to list files: the ls command and the find command.
In this tutorial, we’ll explore how to sum up the size of listed files. Of course, we’ll cover both ls and find commands.
2. A Directory Tree Example
To address how to sum up filesize straightforwardly, let’s create a directory and some files as an example:
$ tree -f myDir
myDir
├── myDir/001.txt
├── myDir/002.txt
├── myDir/003.txt
├── myDir/004.txt
├── myDir/005.txt
├── myDir/images
│ ├── myDir/images/image01.jpg
│ └── myDir/images/image02.jpg
├── myDir/picture01.jpg
└── myDir/picture02.jpg
1 directory, 9 file
As the tree command’s output above shows, under the myDir directory, we have some files and a subdirectory.
Next, let’s see how to calculate the total size of the listed files.
3. Processing the ls -l Command’s Output
We know that using the ls command with the -l option lists files with detailed information. For example, let’s enter the myDir directory and list all files under it using ls -l:
$ ls -l *.*
-rw-r--r-- 1 kent kent 2 Dec 9 13:20 001.txt
-rw-r--r-- 1 kent kent 3 Dec 9 13:20 002.txt
-rw-r--r-- 1 kent kent 4 Dec 9 13:20 003.txt
-rw-r--r-- 1 kent kent 5 Dec 9 13:20 004.txt
-rw-r--r-- 1 kent kent 6 Dec 9 13:20 005.txt
-rw-r--r-- 1 kent kent 354 Dec 9 13:21 picture01.jpg
-rw-r--r-- 1 kent kent 131072 Dec 9 13:23 picture02.jpg
As the output above shows, all files under myDir are listed. Further, detailed information on each file is shown in columns. It’s worth mentioning that ls -l *.* only lists files in the current directory. Files in the subdirectories, such as images in this example, aren’t included.
3.1. Using awk to Sum File Sizes
Let’s take the 001.txt file as an example to understand the ls -l output:
-rw-r--r-- 1 kent kent 2 Dec 9 13:20 001.txt
_--------- - ---- ---- - ----------- -------
^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | +- The filename
| | | | | | |
| | | | | | +--------- The last modification time
| | | | | +---------------- The file size in bytes
| | | | +-------------------- The file owner group
| | | +------------------------- The file owner
| | +---------------------------- The number of hard links
| +----------------------------------- File Permissions
+--------------------------------------- The file type flag, for example:
'-': regular file, 'd': directory, etc.
Now that we understand the ls -l output, if we want to sum the file sizes in the ls -l list, we need to sum the fifth column (file size in bytes) in each file record. To achieve that, we can pipe the ls -l output to the awk command:
$ ls -l *.* | awk '{ sum += $5 } END{ print sum }'
131446
As we can see, the total size (in bytes) of the listed files is calculated and printed. A compact awk one-liner solves our problem. However, we need to type the awk command whenever we want to sum up the filesizes in the ls output. It’s a bit inconvenient.
So next, let’s turn the awk command into a generic shell function, to sum the values in a given column.
3.2. The sumCol Function
First, let’s look at the sumCol function:
sumCol() {
awk -v col="$1" '{ sum += $col } END{ print sum }'
}
As we can see, it looks pretty similar to the previous awk command. The only difference is instead of hard-coding the column number, the awk command in the sumCol function accepts the column number passed to the shell function.
Next, let’s source the function and see how to use it with the ls -l command:
$ ls -l *.* | sumCol 5
131446
We can also use the sumCol function to sum other columns. Let’s see another example:
$ cat numInCol.txt
1 2 3
4 5 6
7 8 9
$ cat numInCol.txt | sumCol 2
15
$ cat numInCol.txt | sumCol 3
18
In the examples above, we use the numInCol.txt file to simulate some column-based output. We see it’s pretty straightforward to use our sumCol function to sum numbers in a given column.
4. Processing the find Command’s Output
As we’ve mentioned, find is another popular way to search and list files. By default, find searches files recursively. For example, we can list all *.jpg image files recursively in the myDir directory:
$ find myDir -name '*.jpg'
myDir/images/image02.jpg
myDir/images/image01.jpg
myDir/picture02.jpg
myDir/picture01.jpg
So next, let’s figure out how to calculate the total size of these found files.
4.1. Using -print0 and the du Command
We know that the du command with the -b option reports the given files or directories size in bytes, for example:
$ du -b myDir/picture01.jpg
354 myDir/picture01.jpg
Additionally, we can add the -c option to make du sum up the file sizes for all files we pass to it:
$ du -bc myDir/*.jpg
354 myDir/picture01.jpg
131072 myDir/picture02.jpg
131426 total
Instead of passing filenames directly to the du command, we can use the –files0-from=F option to tell du to read filenames from the F file. It’s worth mentioning that when F is –, du reads filenames from stdin. Further, the filenames should be terminated by a null character. This is pretty useful if we pipe a bunch of filenames to the du command.
We’ve seen that the find command prints each file’s name with a newline character. So if we want du to process the filenames found by the find command, we can use the -print0 action. *find‘s -print0* action prints each filename followed by a null character. So it fits precisely du with the –files0-from=- option.
Next, let’s pipe find‘s output to du to get the total filesize:
$ find myDir -name '*.jpg' -print0 | du -bc --files0-from=-
6608 myDir/images/image02.jpg
3639 myDir/images/image01.jpg
131072 myDir/picture02.jpg
354 myDir/picture01.jpg
141673 total
As the output above shows, we’ve got a complete filesize report with a total value. In case we’re only interested in the total value, we can pipe the filesize report to the tail command:
$ find myDir -name '*.jpg' -print0 | du -bc --files0-from=- | tail -1
141673 total
4.2. Using -printf “%s\n” to Output File Size in Bytes
We’ve seen find‘s -print0 action prints each filename followed by a null character. The find command supports other actions. For example, the -printf FORMAT action outputs various information through the given FORMAT.
Next, let’s look at a few commonly used formats:
- %s – the size of the file in bytes
- %t – the file’s last modification time
- %n – the number of hard links to the file
- %u – the file’s username
- %m – the file’s permission bits
Our problem is to get the filesizes’ sum for the found files. Therefore, we can use the “*%s*” format to output each file’s size in bytes:
$ find myDir -name '*.jpg' -printf "%s\n"
6608
3639
131072
354
To calculate the sum of these filesizes, we can use our sumCol function again:
$ find myDir -name '*.jpg' -printf "%s\n" | sumCol 1
141673
5. Conclusion
In this article, we’ve learned how to sum up the size of files listed by the ls -l and the find commands. We saw that the two commands cannot produce the total size on their own. But, awk and du can do the job easily.