1. Overview

As system administrators, it’s crucial to understand the internal details of the operating system. Undoubtedly, this knowledge helps in troubleshooting services and processes on our system. Additionally, security researchers can also use this knowledge to identify suspicious files. Understanding the structure of an ELF file helps to understand the internal details of the operating system.

In this tutorial, we’ll learn about an ELF file and its structure. We’ll also use readelf to check the structure of an ELF.

2. ELF

ELF is short for Executable and Linkable Format. It’s a format used for storing binaries, libraries, and core dumps on disks in Linux and Unix-based systems.

Moreover, the ELF format is versatile. Its design allows it to be executed on various processor types. This is a significant reason why the format is common compared to other executable file formats.

Generally, we write most programs in high-level languages such as C or C++. These programs cannot be directly executed on the CPU because the CPU doesn’t understand these instructions. Instead, we use a compiler that compiles the high-level language into object code. Using a linker, we also link the object code with shared libraries to get a binary file.

As a result, the binary file has instructions that the CPU can understand and execute. The binary file can adopt any format that defines the structure it should follow. However, the most common of these structures is the ELF format.

3. The Structure of the ELF File

The ELF file is divided into two parts. The first part is the ELF header, while the second is the file data.

Further, the file data is made up of the Program header table, Section header table, and Data.

Particularly, the ELF header is always available in the ELF file, while the Section header table is important during link time to create an executable. On the other hand, the Program header table is useful during runtime to help load the executable into memory.

Next, let’s look at the ELF file structure:

Elf-layout

For instance, we see the different parts of the ELF file. We set the parts that begin with a dot for the system, while the rest are for applications.

In addition, let’s look at the different parts of the file in more detail.

3.1. ELF Header

Firstly, the ELF header is found at the start of the file. It contains metadata about the file.

For example, some of the metadata found in the ELF header includes information about whether the ELF file is 32-bit or 64-bit, whether it’s using little-endian or big-endian, the ELF version, and the architecture that the file requires.

In particular, the metadata in the ELF header helps different processor architectures to interpret the ELF file.

We use the readelf command with the -h option to show the ELF header of an ELF file. In our case, we are reading the ls binary file:

$ readelf -h /bin/ls 
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Position-Independent Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x6180
  Start of program headers:          64 (bytes into file)
  Start of section headers:          145256 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         11
  Size of section headers:           64 (bytes)
  Number of section headers:         30
  Section header string table index: 29
                                         

3.2. ELF Header Details

At this time, let’s have a closer look at what the fields in the ELF header structure represent:

Field

Explanation

Magic

These are the first bytes in the ELF header. They identify the file as an ELF and contain information that processors can use to interpret the file.

Class

The value in the class field indicates the architecture of the file. As such the ELF can either be 32-bit or 64-bit.

Data

This field specifies the data encoding. This is important to help processors interpret incoming instructions. The most common data encodings are little-endian and big-endian.

Version

Identifies the ELF file version (set to 1)

OS/ABI

ABI is short for Application Binary Interface. In this case, it defines how functions and data structures can be accessed in the program.

ABI Version

This field specifies the ABI version.

Type

The value in this field specifies the object file type. For instance, 2 is for an executable, 3 is for a shared object, and 4 is for a core file.

Machine

This specifies the architecture needed for the file.

Version

Identifies the object file version.

Entry point address

This indicates the address where the program should start executing. In the case that the file is not an executable file, the value in this field is set to 0.

Start of program headers

This is the offset on the file where the program headers start.

Start of section headers

This is an offset that indicates where the section headers start.

Flags

This contains flags for the file.

Size of this header

This specifies how big the ELF header is.

Size of program header

The value in this field specifies how big an individual program header is.

Number of program headers

This indicates how many program headers there are.

Size of section headers

The value in this field shows how big an individual section header is.

Number of section headers

This indicates how many section headers there are.

Section header string table index

The section table index of the entry representing the section name string table

3.3. Program Header Table

Another part is the Program Header Table. The program header table stores information about segments. Each segment is made up of one or more sections. The kernel uses this information at run time. It tells the kernel how to create the process and map the segments into memory.

To run a program, the kernel loads the ELF header and the program header table into memory. Secondly, it loads the contents that are specified in LOAD in the program header table into memory, and it also checks if the interpreter is needed. Finally, the control is given to the executable itself or the interpreter if it’s available.

We use the readelf command with the -l option to display the program headers of an ELF file:

$ readelf -l /bin/ls

Elf file type is DYN (Position-Independent Executable file)
Entry point 0x6180
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x0000000000000268 0x0000000000000268  R      0x8
  INTERP         0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000003538 0x0000000000003538  R      0x1000
  LOAD           0x0000000000004000 0x0000000000004000 0x0000000000004000
                 0x00000000000143c9 0x00000000000143c9  R E    0x1000
  LOAD           0x0000000000019000 0x0000000000019000 0x0000000000019000
                 0x0000000000008ab8 0x0000000000008ab8  R      0x1000
  LOAD           0x0000000000022350 0x0000000000023350 0x0000000000023350
                 0x0000000000001278 0x0000000000002568  RW     0x1000
  DYNAMIC        0x0000000000022dd8 0x0000000000023dd8 0x0000000000023dd8
                 0x00000000000001f0 0x00000000000001f0  RW     0x8
  NOTE           0x00000000000002c4 0x00000000000002c4 0x00000000000002c4
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_EH_FRAME   0x000000000001df0c 0x000000000001df0c 0x000000000001df0c
                 0x0000000000000944 0x0000000000000944  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000022350 0x0000000000023350 0x0000000000023350
                 0x0000000000000cb0 0x0000000000000cb0  R      0x1

 ...

Program headers are essential when running the executable because they tell the operating system all it needs to know to put the executable into memory and run it.

3.4. Section Header Table

The section header stores information about sections. This information is used during dynamic link time, just before the program is executed.

A linker links the binary file with shared libraries that it needs by loading them into memory. The linker’s implementation is specific to the operating system.

Additionally, the section header table contains information that’s used by other files to find the symbolic definitions and references of the program.

We use the readelf command with the -S option to display the information in the section header of a file:

$ readelf -S /bin/ls
There are 30 section headers, starting at offset 0x23768:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         00000000000002a8  000002a8
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.gnu.bu[...] NOTE             00000000000002c4  000002c4
       0000000000000024  0000000000000000   A       0     0     4
  [ 3] .note.ABI-tag     NOTE             00000000000002e8  000002e8
       0000000000000020  0000000000000000   A       0     0     4
  [ 4] .gnu.hash         GNU_HASH         0000000000000308  00000308
       00000000000000ac  0000000000000000   A       5     0     8
...

As we discussed earlier, section headers are significant at link time to link the executable with the libraries it needs to run successfully.

3.5. .text

This section holds the instructions that the program needs for it to run.

3.6. .rodata and .data

.rodata stands for read-only data. As such, these sections contain the actual, initialized data, which the program will need in memory. The memory reserves more space for the data segment than specified in the ELF file to make room for uninitialized variables.

4. Conclusion

In this article, we learned about the ELF file and its structure. We also looked at using readelf to check different parts of an ELF file.