1. Overview

Linux has many tools that we can use for performing search operations in binary files. In this tutorial, we’ll explore several tools, both text-based and GUI-based, to search for a hexadecimal pattern in binary files and compare their performance.

All commands in this guide have been tested on 64-bit Debian 11 (Bullseye), running GNU Bash 5.1.4, grep 3.6, bbe 0.2.2, bgrep 0.2, hexdump 2.36.1, GHex 3.18.4, and Bless 0.6.0.

2. Test File Setup

Firstly, let’s create a test file to use throughout this article by using Perl, and name it test.bin:

$ perl -e 'print 0.0.0.1.0.0.0.2.0.1.0.2.0.3.0.4.0.5.0.6.0.7.0.8.0.9.0.10.0.11.0.1' > test.bin

We could use any other method to create a test file, but Perl is pretty straightforward to do this task.

Next, let’s ensure the file was created correctly by dumping its content using hexdump:

$ hexdump -C test.bin
00000000 00 00 00 01 00 00 00 02 00 01 00 02 00 03 00 04 |................|
00000010 00 05 00 06 00 07 00 08 00 09 00 0a 00 0b 00 01 |................|
00000020

We use hexdump to print the content of test.bin in hexadecimal format because the file consists of 32-byte non-printable characters.

Since the file was created successfully, we’ll use it to test against all the tools we’re going to explore.

3. Using Linux Basic Tools

Linux provides many basic tools for searching hexadecimal patterns in binary files, such as grep and bbe. These two packages usually come with the default OS installation. Otherwise, they’re also available for download from many Linux official repositories.

3.1. grep

grep is a tool to search and print the lines that match a pattern. Although grep is commonly used to search for printable characters in a file or an input stream, it can also be used to search for hexadecimal patterns in binary files.

Now, let’s say that we want to find a two-byte binary sequence from test.bin, for example, a null character (0x00) and 0x01:

$ grep -obUaP "\x00\x01" test.bin | cat --show-nonprinting
2:^@^A
8:^@^A
30:^@^A

The grep command found three occurrences and then printed the offset for each occurrence.

We piped grep output to cat to display non-printable characters in caret notation (–show-nonprinting). Otherwise, those characters wouldn’t be visible:

$ grep -obUaP "\x00\x01" test.bin
2:
8:
30:

To sum up, let’s review the options that we used for grep:

  • -o, –only-matching: print only the matched part, not the whole line
  • -b, –byte-offset: print the 0-based byte offset
  • -U, –binary: treat the file as binary
  • -a, –text: process a binary file as if it were text
  • -P, –perl-regexp: interpret pattern as Perl-compatible regular expressions

3.2. bbe

bbe stands for binary block editor, and in this case, it works like the sed command for binary files.

For example, let’s find a two-byte binary sequence (0x00 0x01) in the test.bin file:

$ bbe -b "/\x00\x01/:2" -s -e "F d" -e "p h" -e "A \n" test.bin
2:x00 x01
8:x00 x01
30:x00 x01

Similarly, as we can see from the code above, the bbe command gave the same results as the grep command in the previous section.

Let’s review the options that we used:

  • -b, –block=BLOCK: search for a pattern between two forward slashes (/\x00\x01/), and define length of bytes to print (:2), starting from the matched character offset
  • -s, –suppress: print only the matched part, similar to ‘grep -o
  • -e, –expression=COMMAND: commands to execute, similar to ‘sed -e
  • -e ‘F d’: display offsets before each result (*2:*…, *8:*…, *30:*…)
  • -e ‘p h’: print results in hexadecimal notation
  • -e ‘A \n’: append end-of-line to each result

4. Using bgrep

bgrep is a simple open-source binary grep project written in C. It can print the matched character offset, along with a specified number of bytes before and after the matched character position.

4.1. Installation

To install bgrep, its GitHub page provides a one-liner:

$ curl -L 'https://github.com/tmbinc/bgrep/raw/master/bgrep.c' | gcc -O2 -x c -o $HOME/.local/bin/bgrep -
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 8271 100 8271 0 0 7204 0 0:00:01 0:00:01 --:--:-- 7204

The command above downloads the bgrep.c file, and then compiles it with GCC. In addition, the output binary is stored in the current user’s local PATH, which makes it accessible from any directory.

4.2. Usage

bgrep provides three options:

  • -A: print n-number of bytes after the occurrence
  • -B: print n-number of bytes before the occurrence
  • -C: print n-number of bytes before and after the occurrence

Let’s see bgrep in action:

$ bgrep 0001 test.bin
test.bin: 00000002
test.bin: 00000008
test.bin: 0000001e

Searching for the two-byte binary sequence (0x00 0x01) in the test.bin file with bgrep gave the same results as the grep and bbe commands.

Furthermore, we can specify the number of bytes to print before and after the matched position by passing the option -C 2:

$ bgrep -C 2 0001 test.bin
test.bin: 00000002
\x00\x00\x00\x01
test.bin: 00000008
\x00\x02\x00\x01
test.bin: 0000001e
\x00\x0b\x00\x01

Having the extra data printed before or after the matched position could be useful if we need to debug some programs or analyze log files.

5. Using GHex

GHex is a GUI-based binary file editor. It’s available for download from many Linux official repositories.

5.1. Installation

Let’s install GHex on Debian:

$ sudo apt install ghex

The command above installs the GHex package.

5.2. Usage

The GUI of GHex is pretty intuitive. To open the test.bin file, we can click the menu File > Open, or press Ctrl + O, select the file test.bin, then click Open:

GHex hex editor

Similarly, we can search for the two-byte binary sequence (0x00 0x01) by clicking the menu Edit > Find or pressing Ctrl + F, enter the pattern that we want to search, and click Find Next:

GHex hex editor, search function

All occurrences are highlighted in red, with the offset displayed at the bottom left.

GHex loads the entire file to memory. Consequently, opening a file that exceeds the available memory might cause system performance issues or out-of-memory errors.

6. Using Bless

Bless is a GUI-based binary file editor. It’s available for download from many Linux official repositories.

6.1. Installation

Here’s how we can install Bless on Debian and its derivatives:

$ sudo apt install bless

The command above installs the Bless package.

6.2. Usage

Bless has a GUI that is similar to GHex, but it has more advanced features.

Let’s open the test.bin file by clicking the menu File > Open, or pressing Ctrl + O, select the file test.bin, and then click Open:

Bless hex editor

Then, let’s search for the two-byte binary sequence (0x00 0x01) by clicking the menu Search > Find or pressing Ctrl + F, select the format of the pattern that we want to search, enter the pattern, and click Find Next:

Bless hex editor, search function

All occurrences are highlighted in blue, with the offset displayed at the bottom.

Unlike GHex, Bless doesn’t load the entire file to the memory and is efficient in handling large data files. It also can do fast find operations with multi-threaded capability.

7. Comparison

As we have learned several tools to search for the hexadecimal pattern in binary files, let’s do a simple performance test to see which one is the fastest in finding the pattern.

7.1. Testing Setup

We can do the test on any system, but some tools, like GHex, load the entire file to the memory, so we need to ensure that our system has enough memory before running the test. Otherwise, we could get an out-of-memory error, causing the OS to stop the process or, worse, causing the system to hang or crash.

For this testing, we at Baeldung do the test on the following hardware and data:

  • Dell Latitude Intel Core i7-6600U CPU @ 2.60GHz × 4, 16GB RAM
  • Harddisk: Seagate external HDD 5TB, 7200rpm, SATA III, ext4
  • File: 8.1GB binary file (a VirtualBox Disk Image)
  • Pattern to search: 0x03 0xC6 0x42 0x07
  • Number of occurrences in the file: 6 occurrences

7.2. Executing the Command

Using the same options for each tool that we have learned in the previous sections, let’s execute the grep, bbe, and bgrep commands:

$ time grep -obUaP "\x03\xC6\x42\x07" /media/baeldung/8gb_binary_file.vdi | cat --show-nonprinting

real    1m34.240s
user    1m15.694s
sys     0m2.873s
$ time bbe -b "/\x03\xC6\x42\x07/:4" -s -e "F H" -e "p h" -e "A \n" /media/baeldung/8gb_binary_file.vdi
x1ae9a3ad:x03 xc6 x42 x07 
x1ee0d869:x03 xc6 x42 x07 
xaaf3c5b7:x03 xc6 x42 x07 
x1545235b7:x03 xc6 x42 x07 
x1717983dd:x03 xc6 x42 x07 
x176bdf869:x03 xc6 x42 x07 

real    0m59.665s
user    0m20.603s
sys     0m7.378s
$ time bgrep 03C64207 /media/baeldung/8gb_binary_file.vdi
/media/baeldung/8gb_binary_file.vdi: 1ae9a3ad
/media/baeldung/8gb_binary_file.vdi: 1ee0d869
/media/baeldung/8gb_binary_file.vdi: aaf3c5b7
/media/baeldung/8gb_binary_file.vdi: 1545235b7
/media/baeldung/8gb_binary_file.vdi: 1717983dd
/media/baeldung/8gb_binary_file.vdi: 176bdf869

real    0m58.054s
user    0m23.646s
sys     0m10.434s

For GHex and Bless, since both are GUI-based tools, we need to do the search manually.

7.3. Results

After running all the commands and searching the pattern using GHex and Bless manually, we have the following data that we can analyze:

=======================================================
| Tool  | Found All 6 |  Loaded Entire | Elapsed Time |
|       | Occurrences | File to Memory |              |
=======================================================
| grep  |      No (0) |             No |    1m34.240s |
| bbe   |         Yes |             No |    0m59.665s |
| bgrep |         Yes |             No |    0m58.054s |
| GHex  |      No (2) |            Yes |            - |
| Bless |         Yes |             No |     4m0.000s |
=======================================================

The grep command couldn’t find any occurrences at all, even though it could work on a smaller size file.

On the other hand, both bbe and bgrep could find all six occurrences. Both completed the search in a similar amount of time with a considerably small memory footprint.

Meanwhile, GHex, the GUI-based tool, could only display 2.1GB (0x80000000 bytes) out of 8.1GB of data on its interface. As a result, it was only able to find the first two occurrences.

However, Bless, another GUI-based tool, could load the entire file on its interface and was able to find all six occurrences in the file, which took around four minutes in total. Unlike GHex, Bless doesn’t load the entire file to the memory but a combination of memory and disk cache instead.

8. Conclusion

In this article, we explored several tools to search for hexadecimal patterns in binary files, both text-based and GUI-based.

We also did a simple performance test for each tool with various results. Text-based tools, like bbe and bgrep, can perform hexadecimal sequence searches on large data files with a considerably small memory footprint. Meanwhile, GUI-based tools, like Bless, can handle large data files efficiently with fast and multi-threaded search operations.