1. Overview
Clang is the C/C++/Objective-C compiler for the Low-Level Virtual Machine (LLVM) project. It is an alternative to existing compilers like GCC, offering many advantages, such as faster compilation speed, low memory use, and meaningful error messages.
In this tutorial, we’ll explore the LLVM design architecture, learn how to install Clang and LLVM from Debian and LLVM repositories, test various versions of Clang, and verify the executables were all compiled with the correct compiler.
All commands and examples in this guide have been tested on Debian 12 (Bookworm).
2. LLVM Overview
LLVM is a compiler framework that implements a three-phase design – frontend, optimizer, and backend:
Unlike classic compiler designs with tightly coupled parts requiring writing from scratch for every new source language, the three-phase design offers a method to separate each compilation phase. This separation makes it easier to support various source languages and target architectures.
For example, implementing new N source languages supporting M targets in classic design would require N*M compilers. In contrast, with LLVM, we can focus on the frontend and reuse the existing optimizer and backend.
2.1. Frontend
LLVM frontend parses and validates the input code, then generates the LLVM Intermediate Representation (IR) file.
Let’s have a look at how the LLVM frontend works. For instance, consider this simple C function:
unsigned add_numbers(unsigned a, unsigned b) {
return a+b;
}
The LLVM frontend, in this case Clang, translates the C code to LLVM IR (.ll file):
define i32 @add_numbers(i32 %a, i32 %b) {
entry:
%tmp1 = add i32 %a, %b
ret i32 %tmp1
}
The LLVM IR code looks like assembly language and is portable and language-independent. It’s designed to be simple for a frontend to generate while providing enough details for optimization.
2.2. Optimizer
The LLVM optimizer performs compiler optimizations to the LLVM IR generated by the frontend.
As an example, this LLVM IR code has some unused code:
define i32 @multiply_numbers(i32 %a, i32 %b) {
%result = mul i32 %a, %b
%unused_var = add i32 %a, %b
ret i32 %result
}
During the Dead Code Elimination (DCE) transformation pass, the optimizer removes the “*%unused_var…*” line:
define i32 @multiply_numbers(i32 %a, i32 %b) {
%result = mul i32 %a, %b
ret i32 %result
}
The DCE pass is one of numerous optimizations performed by the LLVM optimizer.
2.3. Backend
The backend phase is responsible for generating target-specific machine code or executables.
In LLVM, it includes several processes:
Firstly, the LLVM Code Generator generates target-independent code from optimized IR. Then, the LLVM Target-Specific Code Generator generates target-specific code, such as object files. Finally, the LLVM linker (LLD) resolves all references in the object files, links all functions and data from other libraries, and converts them into executables or shared libraries.
Both the LLVM backend and LLVM optimizer are part of the LLVM Core libraries.
3. Installing Clang From Debian Repository
Clang is available on the Debian official repository. Therefore, we can install it using the apt command:
$ sudo apt install clang
After the installation is finished, we can verify it by compiling C/C++ code or simply checking its version:
$ clang --version
Debian clang version 14.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
The clang package includes all dependencies it needs to compile C/C++ code, including the LLVM Core libraries:
$ whereis clang
clang: /usr/bin/clang /usr/lib/clang /usr/include/clang /usr/share/man/man1/clang.1.gz
$ ldd /usr/bin/clang
linux-vdso.so.1 (0x00007ffdd9e9e000)
libclang-cpp.so.14 => /lib/x86_64-linux-gnu/libclang-cpp.so.14 (0x00007f2f51e00000)
libLLVM-14.so.1 => /lib/x86_64-linux-gnu/libLLVM-14.so.1 (0x00007f2f4b400000)
...
We used the whereis command to locate the clang binary. Then, we ran the ldd command to print out all the shared libraries that clang requires to be able to run.
In the output above, we can see that clang requires the libLLVM-14.so shared library.
Notably, both clang and libLLVM versions are the same – 14. LLVM and its sub-projects are in active development; hence, their release packages include all the compatible components.
4. Installing Clang From LLVM Repository
While we can install Clang from the Debian repository, the package is usually outdated. On Debian 12, the Clang version is 14. However, in the LLVM repository, when writing this article, its stable branch is already at version 17.
The LLVM repository has three branches: stable (17), qualification (18), and development (19).
Let’s install the latest version from the stable branch:
$ bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
$ clang-17 --version
Debian clang version 17.0.6 (++20231208085813+6009708b4367-1~exp1~20231208085906.81)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
At this point, we have two versions of Clang on our system: versions 14 and 17.
In case, for some reason, we need to install a specific version, for example, version 16:
$ wget https://apt.llvm.org/llvm.sh
$ chmod +x llvm.sh
$ sudo ./llvm.sh 16
$ clang-16 --version
Debian clang version 16.0.6 (15~deb12u1)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
We now have three versions of Clang on our system: versions 14, 16, and 17.
We used the wget command to download the installation script from the LLVM website. Then, we set the file permission with chmod to execute/run the script file. Finally, we run the script with parameter 16, which refers to LLVM version 16.
Although it’s possible to install or upgrade individual Clang components, considering the ongoing development of the LLVM project, it’s probably best that we use the script that LLVM provides to install LLVM and all its components to avoid potential compatibility issues.
5. Testing
When we have more than one version of Clang installed on our system, we can specify which version we want to use to compile our program. Additionally, we can verify which version of Clang used to compile our program.
Let’s create a program, compile it with GCC and different versions of Clang, and verify the compiler used for compilation.
5.1. Create a Sample Program
Here, we create a simple Hello World program in C:
$ mkdir baeldung-clang && cd baeldung-clang
$ cat > hello.c << EOF
#include <stdio.h>
int main() {
printf("Hello, World!\n");
return 0;
}
EOF
We created a directory using the mkdir command and then entered the directory. Next, we used the cat command to append lines to a file by reading the input until it encountered a certain text (EOF).
5.2. Compile the Program
Let’s compile the program with GCC, Clang 14, Clang 16, and Clang 17 and name the executables accordingly:
$ gcc -o hello-gcc hello.c
$ clang -o hello-clang-14 hello.c
$ clang-16 -o hello-clang-16 hello.c
$ clang-17 -o hello-clang-17 hello.c
$ ls -ogh
total 68K
-rw-r--r-- 1 80 Dec 21 09:36 hello.c
-rwxr-xr-x 1 16K Dec 21 09:37 hello-clang-14
-rwxr-xr-x 1 16K Dec 21 09:40 hello-clang-16
-rwxr-xr-x 1 16K Dec 21 09:40 hello-clang-17
-rwxr-xr-x 1 16K Dec 21 09:38 hello-gcc
We’ve successfully compiled the program with four different compilers.
5.3. Run the Program
We should be able to run all four programs without errors:
$ ./hello-gcc
Hello, World!
$ ./hello-clang-14
Hello, World!
$ ./hello-clang-16
Hello, World!
$ ./hello-clang-17
Hello, World!
Even though we built the code with different compilers, the source code is still the same, so the output is the same.
5.4. Verify the Compiler Used for Compilation
Finally, let’s verify the compiler used to compile the program:
$ readelf -p .comment ./hello-gcc
String dump of section '.comment':
[ 0] GCC: (Debian 12.2.0-14) 12.2.0
$ readelf -p .comment ./hello-clang-14 | grep "clang version"
[ 1f] Debian clang version 14.0.6
$ readelf -p .comment ./hello-clang-16 | grep "clang version"
[ 1f] Debian clang version 16.0.6 (15~deb12u1)
$ readelf -p .comment ./hello-clang-17 | grep "clang version"
[ 1f] Debian clang version 17.0.6 (++20231208085813+6009708b4367-1~exp1~20231208085906.81)
Using the readelf command, we’ve confirmed that all programs were compiled with the correct compiler.
6. Conclusion
In this article, we explored the LLVM design architecture, which implements the three-phase design approach. This approach gives LLVM flexibility by keeping all the components modular, speeding up its development process.
Then, we learned how to install Clang from Debian and LLVM repositories, where we installed various versions of Clang.
Finally, we put those compilers to work by compiling a sample program and verified that the executables were all compiled with the correct compiler.