1. Overview
As system administrators, it’s crucial to understand the internal details of the operating system. Undoubtedly, this knowledge helps in troubleshooting services and processes on our system. Additionally, security researchers can also use this knowledge to identify suspicious files. Understanding the structure of an ELF file helps to understand the internal details of the operating system.
In this tutorial, we’ll learn about an ELF file and its structure. We’ll also use readelf to check the structure of an ELF.
2. ELF
ELF is short for Executable and Linkable Format. It’s a format used for storing binaries, libraries, and core dumps on disks in Linux and Unix-based systems.
Moreover, the ELF format is versatile. Its design allows it to be executed on various processor types. This is a significant reason why the format is common compared to other executable file formats.
Generally, we write most programs in high-level languages such as C or C++. These programs cannot be directly executed on the CPU because the CPU doesn’t understand these instructions. Instead, we use a compiler that compiles the high-level language into object code. Using a linker, we also link the object code with shared libraries to get a binary file.
As a result, the binary file has instructions that the CPU can understand and execute. The binary file can adopt any format that defines the structure it should follow. However, the most common of these structures is the ELF format.
3. The Structure of the ELF File
The ELF file is divided into two parts. The first part is the ELF header, while the second is the file data.
Further, the file data is made up of the Program header table, Section header table, and Data.
Particularly, the ELF header is always available in the ELF file, while the Section header table is important during link time to create an executable. On the other hand, the Program header table is useful during runtime to help load the executable into memory.
Next, let’s look at the ELF file structure:
For instance, we see the different parts of the ELF file. We set the parts that begin with a dot for the system, while the rest are for applications.
In addition, let’s look at the different parts of the file in more detail.
3.1. ELF Header
Firstly, the ELF header is found at the start of the file. It contains metadata about the file.
For example, some of the metadata found in the ELF header includes information about whether the ELF file is 32-bit or 64-bit, whether it’s using little-endian or big-endian, the ELF version, and the architecture that the file requires.
In particular, the metadata in the ELF header helps different processor architectures to interpret the ELF file.
We use the readelf command with the -h option to show the ELF header of an ELF file. In our case, we are reading the ls binary file:
$ readelf -h /bin/ls
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x6180
Start of program headers: 64 (bytes into file)
Start of section headers: 145256 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 11
Size of section headers: 64 (bytes)
Number of section headers: 30
Section header string table index: 29
3.2. ELF Header Details
At this time, let’s have a closer look at what the fields in the ELF header structure represent:
Field
Explanation
Magic
These are the first bytes in the ELF header. They identify the file as an ELF and contain information that processors can use to interpret the file.
Class
The value in the class field indicates the architecture of the file. As such the ELF can either be 32-bit or 64-bit.
Data
This field specifies the data encoding. This is important to help processors interpret incoming instructions. The most common data encodings are little-endian and big-endian.
Version
Identifies the ELF file version (set to 1)
OS/ABI
ABI is short for Application Binary Interface. In this case, it defines how functions and data structures can be accessed in the program.
ABI Version
This field specifies the ABI version.
Type
The value in this field specifies the object file type. For instance, 2 is for an executable, 3 is for a shared object, and 4 is for a core file.
Machine
This specifies the architecture needed for the file.
Version
Identifies the object file version.
Entry point address
This indicates the address where the program should start executing. In the case that the file is not an executable file, the value in this field is set to 0.
Start of program headers
This is the offset on the file where the program headers start.
Start of section headers
This is an offset that indicates where the section headers start.
Flags
This contains flags for the file.
Size of this header
This specifies how big the ELF header is.
Size of program header
The value in this field specifies how big an individual program header is.
Number of program headers
This indicates how many program headers there are.
Size of section headers
The value in this field shows how big an individual section header is.
Number of section headers
This indicates how many section headers there are.
Section header string table index
The section table index of the entry representing the section name string table
3.3. Program Header Table
Another part is the Program Header Table. The program header table stores information about segments. Each segment is made up of one or more sections. The kernel uses this information at run time. It tells the kernel how to create the process and map the segments into memory.
To run a program, the kernel loads the ELF header and the program header table into memory. Secondly, it loads the contents that are specified in LOAD in the program header table into memory, and it also checks if the interpreter is needed. Finally, the control is given to the executable itself or the interpreter if it’s available.
We use the readelf command with the -l option to display the program headers of an ELF file:
$ readelf -l /bin/ls
Elf file type is DYN (Position-Independent Executable file)
Entry point 0x6180
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x0000000000000268 0x0000000000000268 R 0x8
INTERP 0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000003538 0x0000000000003538 R 0x1000
LOAD 0x0000000000004000 0x0000000000004000 0x0000000000004000
0x00000000000143c9 0x00000000000143c9 R E 0x1000
LOAD 0x0000000000019000 0x0000000000019000 0x0000000000019000
0x0000000000008ab8 0x0000000000008ab8 R 0x1000
LOAD 0x0000000000022350 0x0000000000023350 0x0000000000023350
0x0000000000001278 0x0000000000002568 RW 0x1000
DYNAMIC 0x0000000000022dd8 0x0000000000023dd8 0x0000000000023dd8
0x00000000000001f0 0x00000000000001f0 RW 0x8
NOTE 0x00000000000002c4 0x00000000000002c4 0x00000000000002c4
0x0000000000000044 0x0000000000000044 R 0x4
GNU_EH_FRAME 0x000000000001df0c 0x000000000001df0c 0x000000000001df0c
0x0000000000000944 0x0000000000000944 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000022350 0x0000000000023350 0x0000000000023350
0x0000000000000cb0 0x0000000000000cb0 R 0x1
...
Program headers are essential when running the executable because they tell the operating system all it needs to know to put the executable into memory and run it.
3.4. Section Header Table
The section header stores information about sections. This information is used during dynamic link time, just before the program is executed.
A linker links the binary file with shared libraries that it needs by loading them into memory. The linker’s implementation is specific to the operating system.
Additionally, the section header table contains information that’s used by other files to find the symbolic definitions and references of the program.
We use the readelf command with the -S option to display the information in the section header of a file:
$ readelf -S /bin/ls
There are 30 section headers, starting at offset 0x23768:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 00000000000002a8 000002a8
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.gnu.bu[...] NOTE 00000000000002c4 000002c4
0000000000000024 0000000000000000 A 0 0 4
[ 3] .note.ABI-tag NOTE 00000000000002e8 000002e8
0000000000000020 0000000000000000 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000000308 00000308
00000000000000ac 0000000000000000 A 5 0 8
...
As we discussed earlier, section headers are significant at link time to link the executable with the libraries it needs to run successfully.
3.5. .text
This section holds the instructions that the program needs for it to run.
3.6. .rodata and .data
.rodata stands for read-only data. As such, these sections contain the actual, initialized data, which the program will need in memory. The memory reserves more space for the data segment than specified in the ELF file to make room for uninitialized variables.
4. Conclusion
In this article, we learned about the ELF file and its structure. We also looked at using readelf to check different parts of an ELF file.