在Linux中编译和运行汇编代码的方法

1. Overview

An assembly language is a low-level programming language that communicates directly with hardware. Assembly languages are human-readable versions of machine code. We use assemblers to convert assembly code to machine code.

Each processor family has its own assembly language with different instruction sets. For example, the x86 assembly language is the assembly language for Intel processors. Besides the processor architecture, the executable file format may differ between operating systems. Therefore, there might be several assemblers for the same architecture.

In this tutorial, we’ll discuss how to compile and run assembly code in Linux. We’ll first discuss the two common x86 assembly language syntaxes, AT&T and Intel. Then, we’ll examine three popular assemblers: The GNU Assembler, The Netwide Assembler, and the flat assembler.

2. The AT&T and Intel Syntaxes

x86 assembly language has two prevalent syntaxes, AT&T and Intel. Assemblers generally support only one of them, but some assemblers work with both.

The dominant syntax in the Linux domain is naturally the AT&T syntax as Unix was developed at AT&T Bell Labs. For example, GCC, the default compiler of Linux, uses the AT&T syntax by default. However, the dominant syntax in the Windows domain is the Intel syntax.

There are several differences between the two syntaxes. For example, if we want to assign a value to a register using the AT&T syntax, we use the mov instruction:

mov $1, %rax

We assign the value 1 to the rax register in this example. The destination register is specified after the source in the AT&T syntax. We prefix values with a dollar sign, as in $1. Similarly, we prefix registers with a percentage sign, as in %rax.

To assign a value to a register using the Intel syntax, we use the mov instruction differently:

mov rax, 1

The order of the operands in the Intel syntax is the opposite of that in the AT&T syntax — the destination register is specified before the source. Additionally, there are no signs prefixed to the values and registers.

3. Using as With the AT&T Syntax

We’ll explore how to use the GNU Assembler, as, in this section to build assembly code using the AT&T syntax. The GNU Assembler supports the AT&T syntax by default. However, it also supports the Intel syntax as we’ll see in the next section.

3.1. Example Assembly Code

We’ll use the following assembly code, hello_baeldung_att.asm:

$ cat hello_baeldung_att.asm
.global _start

.section .data
message: .ascii "Hello Baeldung\n"

.section .text
_start:
    mov $1, %rax
    mov $1, %rdi
    mov $message, %rsi
    mov $15, %rdx
    syscall
        
    mov $60, %rax
    mov $0, %rdi
    syscall

This assembly code prints Hello Baeldung and exits with an exit status of 0.

3.2. Dissection of the Code

Let’s break down the code to analyze it briefly:

.global _start

The .global directive specifies _start as the entry point in the program, just like the main() function in a C program. It exports the _start symbol for the linker, ld. Therefore, _start is added to the object code, hello_baeldung_att.o, when we compile hello_baeldung_att.asm.

Then, we have the data section:

.section .data
message: .ascii "Hello Baeldung\n"

The .section .data directive shows the beginning of the data section. Initialized variables and constants are declared in the data section. We declare a variable, message, whose value is “Hello Baeldung\n”. The .ascii directive shows that the variable’s type is a string.

Then, the text section starts:

.section .text

The .section .text directive shows the beginning of the text section. The actual program code is in this section.

We print message at the beginning of _start:

_start:
    mov $1, %rax
    mov $1, %rdi
    mov $message, %rsi
    mov $15, %rdx
    syscall

We need to set up the registers accordingly to write to the terminal. Each register has a specific role in a system call. We need to put the system call number into the rax register. 1 is the corresponding system call number for sys_write() in the x86-64 architecture. The rdi register contains the file descriptor, 1, which is the standard output. We store the message to be printed in the rsi register. Lastly, the rdx register holds the length of the message, which is 15 in our case.

Once we set up the registers, we use the syscall instruction to trigger the sys_write() system call.

Finally, we exit from the program gracefully:

mov $60, %rax
mov $0, %rdi
syscall

The system call number for sys_exit() is 60. So, we write 60 to the rax register. We need to write the exit status of the program to the rdi register, which is 0 in our case. Lastly, we call the syscall instruction to trigger the sys_exit() system call.

3.3. Building and Running

Let’s now build hello_baeldung_att.asm using as:

$ as hello_baeldung_att.asm -o hello_baeldung_att.o

The GNU Assembler, as, takes the assembly code in hello_baeldung_att.asm, as input and generates the object file, hello_baeldung_att.o, which we specify using the -o option. Then, we create the executable, hello_baeldung_att, using the ld command:

$ ld -s -o hello_baeldung_att hello_baeldung_att.o

The linker takes the object file as input and generates the executable, hello_baeldung_att. The -o option of ld specifies the name of the executable, and the -s option strips all symbol information from the executable.

Having built the executable, let’s run it:

$ ./hello_baeldung_att
Hello Baeldung

The output is as expected.

4. Using as With the Intel Syntax

In this section, we’ll see that it’s possible to use the GNU Assembler for building assembly code in the Intel syntax.

4.1. Example Assembly Code

We’ll use the following assembly code, hello_baeldung_intel.asm:

$ cat hello_baeldung_intel.asm
.global _start

.section .data
message: .ascii "Hello Baeldung\n"

.section .text
_start:
    mov rax, 1
    mov rdi, 1
    mov rsi, offset message
    mov rdx, 15
    syscall
 
    mov rax, 60
    mov rdi, 0
    syscall

The content of hello_baeldung_intel.asm is similar to the content of hello_baeldung_att.asm apart from using the Intel syntax instead of the AT&T syntax.

4.2. Building and Running

Let’s now build hello_baeldung_intel.asm using as:

$ as -msyntax=intel -mnaked-reg hello_baeldung_intel.asm -o hello_baeldung_intel.o

We use two additional options in this case while generating the object file, hello_baeldung_intel.o. In particular, we use the -msyntax=intel option to specify that the assembly code is in the Intel syntax. This option’s default value is att. Additionally, we specify that registers don’t require a % prefix by using the -mnaked-reg option.

The linking step is the same as before:

$ ld -s -o hello_baeldung_intel hello_baeldung_intel.o

Having built the executable, let’s run it:

$ ./hello_baeldung_intel
Hello Baeldung

The output is as expected.

5. Using nasm

The Netwide Assembler, nasm, is another alternative for building assembly code in Linux. It can be installed using the nasm.x86_64 package. However, we need to enable PowerTools in RPM-based distros.

The Netwide Assembler supports the Intel syntax.

5.1. Example Assembly Code

We’ll use the following assembly code, hello_baeldung_nasm.asm:

$ cat hello_baeldung_nasm.asm
global _start
section .data
message: db "Hello Baeldung", 0xa

section .text
_start:
    mov rax, 1
    mov rdi, 1
    mov rsi, message
    mov rdx, 15
    syscall

    mov rax, 60
    mov rdi, 0
    syscall

It’s similar to the hello_baeldung_intel.asm apart from a few minor differences.

5.2. Building and Running

Let’s now build hello_baeldung_nasm.asm using nasm:

$ nasm -f elf64 hello_baeldung_nasm.asm -o hello_baeldung_nasm.o

The usage of nasm is similar to as. However, there’s an additional -f option, which specifies the format of the output object file, hello_baeldung_nasm.o. We specify the output format as 64-bit ELF (Executable and Linkable Format) using -f elf64. ELF is the default format for executables and shared libraries in Linux.

The linking step is the same as before:

$ ld -s -o hello_baeldung_nasm hello_baeldung_nasm.o

Having built the executable, let’s run it:

$ ./hello_baeldung_nasm
Hello Baeldung

The output is as expected.

6. Using fasm

The flat assembler, fasm, is another option for building assembly code in Linux. It can be downloaded from https://flatassembler.net/download.php. It supports the Intel syntax.

6.1. Example Assembly Code

We’ll use the following assembly code, hello_baeldung_fasm.asm:

$ cat hello_baeldung_fasm.asm
format elf64 executable
entry _start

message: db "Hello Baeldung", 0xa

_start:
    mov rax, 1
    mov rdi, 1
    mov rsi, message
    mov rdx, 15
    syscall

    mov rax, 60
    mov rdi, 0
    syscall

The text section is similar to the previous assembly code in the Intel syntax. However, it has its own set of directives. For example, the entry directive sets the entry point in the executable, which is _start as in the previous examples. The format elf64 executable directive, on the other hand, is for creating a 64-bit ELF executable.

6.2. Building and Running

Let’s now build hello_baeldung_fasm.asm using fasm:

$ fasm ./hello_baeldung_fasm.asm hello_baeldung_fasm
flat assembler  version 1.73.32  (16384 kilobytes memory, x64)
2 passes, 181 bytes.

The first argument of fasm is the file containing the assembly code, and the second argument is the output executable file. They’re hello_baeldung_fasm.asm and hello_baeldung_fasm, respectively, in our case. fasm generates the executable file directly without any intermediate object files.

Let’s run hello_baeldung_fasm:

$ ./hello_baeldung_fasm
Hello Baeldung

The output is as expected.

7. Conclusion

In this article, we discussed how to compile and run assembly code in Linux. First, we learned the AT&T and Intel syntaxes. Then, we saw how to use the GNU Assembler, the Netwide Assembler, and the flat assembler to compile and run assembly code. Notably, the GNU Assembler supports both the AT&T and Intel syntaxes, while the other two support only the Intel syntax.

Persistence

REST

Security

1. Overview

2. The AT&T and Intel Syntaxes

3. Using as With the AT&T Syntax

3.1. Example Assembly Code

3.2. Dissection of the Code

3.3. Building and Running

4. Using as With the Intel Syntax

4.1. Example Assembly Code

4.2. Building and Running

5. Using nasm

5.1. Example Assembly Code

5.2. Building and Running

6. Using fasm

6.1. Example Assembly Code

6.2. Building and Running

7. Conclusion