1. Overview

Processes in the user space get service from the kernel using system calls. However, frequent usage of system calls may be expensive due to several reasons. So, sometimes it might be a good idea to minimize the usage of system calls.

In this tutorial, we’ll discuss compound system calls in Linux, which can group multiple system calls into one. We’ll learn about the io_uring library. Finally, we’ll see an example using the io_uring library.

2. What Is a Compound System Call?

System calls in Linux provide interaction between the kernel and processes in the user space. We generally call wrapper functions from the libraries in the user space such as glibc, and they dispatch the call to the kernel using the corresponding system call. For example, we can use the unlinkat() function in the glibc library to remove an empty directory in the file system. This function invokes the corresponding unlinkat() system call.

However, system calls are expensive because of several reasons such as context switching between the user and kernel spaces, synchronization overhead, and copying of data. A compound system call might be a remedy for this performance overhead by grouping multiple system calls into a single system call.

3. The io_uring Library

The io_uring library enables us to execute multiple system calls in a single system call. Indeed, io_uring provides an asynchronous I/O API to eliminate the limitations of system calls like select() and poll(), or I/O APIs like epoll and aio.

The name of the io_uring library stems from the two ring buffers it uses for communication between the kernel space and user space. These ring buffers are the submission and completion queues. The submission queue is for submitting requests to the kernel. We add the system calls that we want to be handled within a compound system call as submission queue entries (SQEs).

The completion queue, on the other hand, is used for informing about the completion of requests. It consists of completion queue entries (CQEs). Once the kernel processes the requests in the submission queue, it returns the corresponding results as CQEs.

The io_uring library maps the two ring buffers to the user space using two mmap() system calls and one io_uring_setup() system call. Similarly, it uses two munmap() system calls to tear down the mappings at the end. Additionally, it uses the io_uring_enter() system call to inform the kernel about the addition of SQEs to the submission queue. However, despite these extra system calls, when the number of system calls added to the submission queue is high or in the case of asynchronous I/O, the io_uring library results in a performance gain.

4. An Example

We’ll use the following C program, compound_sys_call.c, to analyze compound system calls:

$ cat compound_sys_call.c
#include <stdio.h>
#include <string.h>
#include <dirent.h>
#include <unistd.h>
#include <fcntl.h>
#include <liburing.h>
#include <sys/syscall.h>

#define QUEUE_DEPTH 5

int main(int ac, char **av)
{
    DIR *dirp;
    int fd;
    struct io_uring ring;
    struct io_uring_sqe *sqe;
    struct io_uring_cqe *cqe;

    const char *dir_path = "/tmp/test_directory";
    dirp = opendir(dir_path); 
    io_uring_queue_init(QUEUE_DEPTH, &ring, 0);

    for (int i = 0; i < QUEUE_DEPTH; i++) {
        sqe = io_uring_get_sqe(&ring);
        io_uring_prep_unlinkat(sqe, dirfd(dirp), dir_path, AT_REMOVEDIR);
    }

    io_uring_submit(&ring);

    for (int i = 0; i < QUEUE_DEPTH; i++) {
        io_uring_wait_cqe(&ring, &cqe);
        printf("Result : %d\n", cqe->res);
        io_uring_cqe_seen(&ring, cqe);
    }

    io_uring_queue_exit(&ring);

    closedir(dirp);

    return 0;
}

This program tries to remove an existing directory by invoking the unlinkat() system call five times. However, we don’t execute each system call separately, but in a single compound system call using the io_uring library. Therefore, we expect the first unlinkat() call to be successful and the other calls to be unsuccessful.

In the next subsection, we’ll break down the code to understand the usage of the io_uring library. The liburing-dev package, which is available on Ubuntu 22.04, must be already installed. The version of the library we use is 2.1.

4.1. Initializing the Library

We need to initialize the io_uring library using the io_uring_queue_init() function:

io_uring_queue_init(QUEUE_DEPTH, &ring, 0);

The first parameter of *io_*uring_queue_init() specifies the number of entries in the submission queue. Its value is QUEUE_DEPTH in our case, which is 5 because of #define QUEUE_DEPTH 5. The second parameter is a pointer to the io_uring structure which is filled by the kernel. Finally, the last parameter specifies any necessary flags we need to pass. We don’t pass any flags in our case since the corresponding argument is 0.

4.2. Preparing the Submission Queue

Next, we prepare the submission queue in a for loop:

for (int i = 0; i < QUEUE_DEPTH; i++) {
    sqe = io_uring_get_sqe(&ring);
    io_uring_prep_unlinkat(sqe, dirfd(dirp), dir_path, AT_REMOVEDIR);
}

The io_uring_get_sqe() library call returns the next available SQE from the submission queue. We pass the pointer to the io_uring structure, ring, as an argument to the function. It returns a pointer to the next SQE on success.

Then, we call the io_uring_prep_unlinkat() function to prepare an unlinkat request. The first argument is the SQE returned by io_uring_get_sqe().

The second argument is the file descriptor corresponding to the directory we want to remove. We get the associated file descriptor by passing the directory stream, dirp, to the dirfd() function.

The third argument, dir_path, is the path to the directory we want to remove. Finally, the fourth argument, the AT_REMOVEDIR flag, specifies the deletion of the directory.

We invoke the same library calls in the for loop QUEUE_DEPTH times since we want to remove the same directory QUEUE_DEPTH times within the compound system call.

4.3. Submitting the Compound System Call

Next, we submit the SQEs to the kernel:

io_uring_submit(&ring);

We use the io_uring_submit() library call once we prepare the SQEs.

4.4. Retrieving the Completions

Next, we wait for the completion of the calls in a for loop:

for (int i = 0; i < QUEUE_DEPTH; i++) {
    io_uring_wait_cqe(&ring, &cqe);
    printf("Result : %d\n", cqe->res);
    io_uring_cqe_seen(&ring, cqe);
}

We use the io_uring_wait_cqe() library call to wait for the completion of calls. The first argument of io_uring_wait_cqe() is the pointer to the io_uring structure, ring, as before. The second argument is the CQE that is filled by the kernel if a system call is completed successfully. We print the result of the call using the res field of the CQE, i.e., printf(“Result : %d\n”, cqe->res).

The consumption of an event needs to be marked as completed by using the io_uring_cqe_seen() library call to get the results of the remaining calls.

We execute the same library calls in the for loop QUEUE_DEPTH times to get the result of each call.

4.5. Releasing the Resources

Finally, we release all the sources acquired and initialized by io_uring_queue_init() using the io_uring_queue_exit() library function:

io_uring_queue_exit(&ring);

We pass the pointer to the io_uring structure, ring, as the argument to this function.

4.6. Building and Running the Example

Let’s now build the program using gcc:

$ gcc -o compound_sys_call compound_sys_call.c –luring
$ ls compound_sys_call
compound_sys_call

Building the executable is successful. The name of the executable is compound_sys_call, which we specify using the -o option of gcc. We need to link the executable with the liburing.so library by passing -luring to gcc.

Having built the executable, let’s run it after creating the /tmp/test_directory directory using mkdir:

$ mkdir /tmp/test_directory
$ ./compound_sys_call
Result : 0
Result : -2
Result : -2
Result : -2
Result : -2

The result of the first call is 0, which means that the removal of the /tmp/test_directory is successful. However, the results of the remaining four calls are -2 since the directory doesn’t exist anymore. The operating system sets the errno to 2 (ENOENT – No such file or directory). However, the io_uring library sets the result to the negative of errno in case of error. Therefore, the result we obtain is -2 for the failure cases.

5. Conclusion

In this article, we discussed compound system calls in Linux. We learned that it’s possible to group multiple system calls into a single compound system call using the io_uring library. Additionally, we saw an example that demonstrates the usage of this library.