1. Overview
All executable files contain machine code, which can be executed by the processor. But opening and reading a binary file by a human without conversion to another format makes no sense.
In this tutorial, we’ll check how we can read machine code in Linux.
2. The Problem
Let’s look at two problem scenarios. Let’s say we have machine code stored in a file or as a string.
Let’s see how we can disassemble it using different tools.
2.1. Reading From a File
We’ll create a binary file using a simple C program. We can then check how to convert the machine code in that binary to assembly language.
Let’s create a binary out of a C program:
$ cat test.c
#include
void main() {
int i = 0;
i += 20;
return;
}
$ gcc test.c -o test
$ ls
test test.c
$
As shown above, we have a C program that adds 20 to variable i. We then compiled the C program to produce a binary. If we compile using the -c flag, it outputs an object file with .o extension:
$ gcc -c test.c
$ ls
test test.c test.o
$
Now we are ready with a binary file and an object file.
2.2. Reading From a String
There are times we might want to analyze some random shellcode to see what it does.
Let’s look at some machine codes:
54: push esp
55: push ebp
90: nop
Now, let’s store this into a file that can be later read and disassembled:
$ echo -ne '\x54\x55\x90' > code
$ ls
code test test.c test.o
$
With the above command, we echoed the shellcode string to a binary file named code.
Next, we’ll check how we can read these files.
3. Using the objdump Command
The objdump command is generally used to inspect the object files and binary files. It prints the different sections in object files, their virtual memory address, logical memory address, debug information, symbol table, and other pieces of information.
The general usage is:
objdump OPTIONS objfile ...
Here we’ll see how we can use this tool to disassemble the files.
3.1. Reading From a File
Using the -d option, we can see the assembly code for the binary:
$ objdump -d test
test: file format elf64-x86-64
..
00000000000005fa <main>:
5fa: 55 push %rbp
5fb: 48 89 e5 mov %rsp,%rbp
5fe: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
605: 83 45 fc 14 addl $0x14,-0x4(%rbp)
609: 90 nop
60a: 5d pop %rbp
60b: c3 retq
60c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000000610 <__libc_csu_init>:
..
$
A binary file contains a lot of sections in ELF format with address and metadata for properly loading the executable when it is launched. Since we have used the -d flag, it’ll print all the executable sections. Here we can see the relevant main section after stripping off others.
We see the add instruction to add 20 (0x14) to the variable i at the memory address 605.
In order to ensure this is the disassembly, we may modify the C program, compile it and run the objdump command on it again to see the changes.
Similarly, we can run the same command on the object file to disassemble the code:
$ objdump -d test.o
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
b: 83 45 fc 14 addl $0x14,-0x4(%rbp)
f: 90 nop
10: 5d pop %rbp
11: c3 retq
$
As we can see above, unlike the binary file, the object file shows only the main section.
By default, it shows the disassembly in ATT mnemonic. If we need to change to Intel, then we can use the -M option:
$ objdump -d test.o -M intel
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
b: 83 45 fc 14 add DWORD PTR [rbp-0x4],0x14
f: 90 nop
10: 5d pop rbp
11: c3 ret
$
2.3. Reading From a String
Once we saved the string to a file, we can use the command below to show the disassembly:
$ objdump -D -b binary -m i386 code
code: file format binary
Disassembly of section .data:
00000000 <.data>:
0: 54 push %esp
1: 55 push %ebp
2: 90 nop
$
As seen above, since this is a raw file, we need to give more information to the objdump command to disassemble it properly.
The options used in the above command are:
- -D: disassemble all sections
- -b: object code format, we say it is binary
- -m: for which architecture the code is, we say it is i386
And from the result, we can see that the shellcode in the file is printed correctly in the output.
3. Using the gdb Command
If we need to debug something, gdb is the go-to tool. Using gdb, we can also disassemble code:
$ gdb test
(gdb) disassemble main
Dump of assembler code for function main:
0x00000000000005fa <+0>: push %rbp
0x00000000000005fb <+1>: mov %rsp,%rbp
0x00000000000005fe <+4>: movl $0x0,-0x4(%rbp)
0x0000000000000605 <+11>: addl $0x14,-0x4(%rbp)
0x0000000000000609 <+15>: nop
0x000000000000060a <+16>: pop %rbp
0x000000000000060b <+17>: retq
End of assembler dump.
(gdb) q
$
As shown above, we loaded the binary into gdb and executed the disassemble command on the main function to see the assembly code.
4. Using the ndisasm Command
The ndisasm utility comes along with the nasm package. It is mainly used to disassemble shellcode. It can disassemble binary files, but it doesn’t show the sections properly. So it would be very difficult to figure out the structure.
The typical usage is:
ndisasm [-b16 | -b32] filename
Let’s see an example of how to use it to disassemble a string of machine code that we earlier saved to a file:
$ ndisasm -b32 code
00000000 54 push esp
00000001 55 push ebp
00000002 90 nop
$
As shown above, we’ve passed the processor mode as 32 bit, and it has generated the assembly code for that.
5. Conclusion
In this tutorial, we’ve seen how we can disassemble machine code from a file or from a string.