1. Introduction
In Linux, there are two types of files: binary and text. Text files are human-readable, while binary files contain machine-readable binary data that is usually executable. In this tutorial, we’ll look at how to find the binary files in a given directory and distinguish them from text files.
2. Using grep
grep stands for “Global Regular Expression Print”, and we can use it to search for patterns in strings and files using regular expressions. We can use the grep command as follows to find all the binary files in a directory:
$ grep -rIL .
hello
In the above command, we used three flags -r, -I, and -L, and a pattern “.”. The pattern we used matches every character, so all the files in the directory are matched when we use the pattern. We used the -r flag to recursively iterate over all the files in every subdirectory of the current directory. The -I flag instructs the command to treat binary files as unmatched, and the -L flag makes the command print only the unmatched files. As a result, we see a list of binary files as the output.
One caveat with this approach is that we’ll also match empty files. Since they have no content, the wildcard “.” pattern won’t match those files. Hence, we must be aware of this and filter out the empty files after matching as per our requirement.
3. Using find
We can use the find command to find all the files in the given directory and its subdirectories and run grep on each of these files to determine whether they are binary files. Let us run the command and see the output:
$ find . -type f -exec grep -IL . "{}" \;
./hello
./empty.txt
In the above command, the dot after the find is a wildcard pattern. It instructs the command to match any given file. We specify the –type parameter as f to match only files and exclude directories. Further, we specify the exec parameter to run grep on every file that is matched.
Here too, we notice that empty files are matched in addition to the binary files. We can add a size filter to avoid matching empty files:
$ find . -type f ! -size 0 -exec grep -IL . "{}" \;
./hello
In the above command, we used the not operator (“!”), along with the size parameter set to zero, to exclude files of size zero. Thus, we obtained the list the non-empty binary files.
4. Using file
The file command is used to determine the type of a given file. We can run this command and look for executable files, assuming most binary files are executables:
$ file *
empty.txt: empty
go.mod: ASCII text
go.sum: ASCII text
hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=mAPWeNpuss-0Jy1bMCdQ/OWIyYupSU3PIAgW3araK/jjhFMFRZhEC1CSJYvOrL/_oYKq2la6fbfD4WWmk39, not stripped
hello.go: C source, ASCII text
When we run the file command with the wildcard argument “*”, it matches and runs the command on all the files. From the output, we see that the file hello is an executable. Subsequently, to print only the executable files, we can filter the output with grep:
$ file * | grep executable
hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=mAPWeNpuss-0Jy1bMCdQ/OWIyYupSU3PIAgW3araK/jjhFMFRZhEC1CSJYvOrL/_oYKq2la6fbfD4WWmk39, not stripped
Now, we’ve printed only the executable files. One caveat here is that the name of the file itself may contain the string “executable” and we’ll match it though the file itself is not executable:
$ file * | grep executable
executable.txt: empty
hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=mAPWeNpuss-0Jy1bMCdQ/OWIyYupSU3PIAgW3araK/jjhFMFRZhEC1CSJYvOrL/_oYKq2la6fbfD4WWmk39, not stripped
To circumvent this, we can change our regex pattern to match only if the string “executable” occurs after a colon:
$ file * | grep ":.*executable"
hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=mAPWeNpuss-0Jy1bMCdQ/OWIyYupSU3PIAgW3araK/jjhFMFRZhEC1CSJYvOrL/_oYKq2la6fbfD4WWmk39, not stripped
Now, the command does not match the file executable.txt.
5. Conclusion
In this article, we looked at several ways to find the binary files in a directory. Using grep was the simplest method, but it came with the caveat of also printing empty files. We could overcome this limitation by using the find and file commands.