1. Overview

In this article, we’ll be looking at the tool strace in Linux. We’ll start with a simple introduction and then follow with some usages of strace.

2. strace

strace is a diagnostic tool in Linux. It intercepts and records any syscalls made by a command. Additionally, it also records any Linux signal sent to the process. We can then use this information to debug or diagnose a program. It’s especially useful if the source code of the command is not readily available.

3. Installation

On Debian based Linux such as Ubuntu, we can install strace using apt-get:

$ apt-get install -y strace

On the other hand, we’ll install strace using yum for RHEL based Linux such as CentOS:

$ yum install -y strace

4. Basic Usage

In its simplest form, we can invoke strace followed by a command we wish to trace:

$ strace pwd
execve("/usr/bin/pwd", ["pwd"], 0x7fffcb3aa770 /* 9 vars */) = 0
brk(NULL)                               = 0x5631e77f6000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffdb256c5e0) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
...(Subsequent output truncated)

In the example above, strace runs the command pwd. Subsequently, strace intercepts and records all the syscalls made by pwd. Finally, the recorded syscalls and signals are displayed onto the console when the command returns.

Firstly, each line in the output represents one syscall made by the command. In this example, the first line shows that the execve syscall is called at the start of the command. The execve is a syscall that executes the program referred to by the first argument.

Then, we can see that strace also displays the exact arguments involved in the syscall. Particularly, the syscall is executing the binary on the path /usr/bin/pwd and passing “pwd” as its first argument. Additionally, the exit code of the syscall is display beside the equal symbol. In this case, the syscall returns an exit code 0, indicating the success.

Similarly, syscalls that result in error will have their error exit code and a description displayed:

access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)

5. Attaching strace to Running Process

To attach strace to a running process, we can use the flag -p followed by the PID.

Let’s start a sleep process and return the PID:

$ sh -c 'echo $$; exec sleep 60'
50

Then, on another terminal, we can attach to the process with strace using the flag -p:

$ strace -p 50
strace: Process 50 attached
restart_syscall(<... resuming interrupted clock_nanosleep ...>) = 0

6. Altering Environment Variables List

With strace, we can alter the list of environment variables inherited by the process we are tracing. To pass an additional environment variable to the process, we can use the flag -E:

$ strace -E var1=val1 pwd

In the example above, the environment variable var1 will be set to val1. Then, this environment variable will be passed to the process pwd.

Similarly, we can use the same flag to prevent the process from inheriting an environment variable:

$ strace -E var1 pwd

When we do not specify the value, the environment variable will not be inherited by the process.

7. Running Command as a Specific User

To run and trace a program as another user, we can use the flag -u followed by the username. One prerequisite for this option is that we’ll need to run strace as a root user.

To run as a user “baeldung”, we can use the flag -u followed by the username:

# strace -u baeldung whoami

In the example above, strace runs the command whoami as user baeldung.

8. Obtaining Timing Information

We can also obtain some timing and duration information using strace. For example, we can display the timestamp of each syscall using the flag -t:

$ strace -t whoami
06:02:38 execve("/usr/bin/whoami", ["whoami"], 0x7ffdc4811038 /* 12 vars */) = 0
-TRUNCATED-

Additionally, we can obtain timestamp with microseconds resolution using the flag -tt:

$ strace -tt whoami
06:07:10.899089 execve("/usr/bin/whoami", ["whoami"], 0x7fff396d2898 /* 12 vars */) = 0
-TRUNCATED-

Finally, we can display the duration of each syscalls using the flag -T:

$ strace -T whoami
execve("/usr/bin/whoami", ["whoami"], 0x7fff0493d078 /* 12 vars */) = 0 <0.000274>
-TRUNCATED-

With the flag -T, strace append the duration of syscalls in seconds beside the exit code. In the example above, the command has spent 0.000274 seconds in the syscall execve.

9. Reporting Statistics

As a powerful diagnostic tool, strace is capable of calculating and reporting some statistics on the program it has traced.

9.1. Summary of a Command Run

To get a summary of the command, we can use the flag -c:

$ strace -c whoami
baeldung
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 18.62    0.000264          26        10           close
 15.16    0.000215          16        13           mmap
 12.27    0.000174          29         6           openat
-TRUNCATED-
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2         1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.001418                    68         4 total

With the flag -c, strace displays a summary of the command run instead of each syscall and signals.

As we can see from the output, the summary groups the result by syscall.

Starting from the left, the first column displays the time spent by this syscall in terms of percentage. Then, the second columns describe the same measurement in seconds. On the usecs/call column, the value represents the average microseconds spend on each call. Finally, the fourth and fifth columns report the total number of calls and errors, respectively.

On a side note, the flag -c displays only the summary of a command run. To display regular output as well as the summary, we can use the flag -C instead:

$ strace -C whoami

9.2. Sorting the Result by Columns

We can sort the summary result by different columns using the flag -S. For example, we can sort the result by the number of errors in descending order:

$ strace -c -S errors whoami
baeldung
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  2.37    0.000055          27         2         2 connect
  1.68    0.000039          39         1         1 access
  3.49    0.000081          40         2         1 arch_prctl
-TRUNCATED-
  8.30    0.000193         193         1           execve
  1.46    0.000034          34         1           geteuid
  8.56    0.000199          33         6           openat
------ ----------- ----------- --------- --------- ----------------
100.00    0.002324                    68         4 total

10. strace Expression

strace supports a rich set of expressions that can alter several aspects of strace. For instance, using an expression, we can filter the output by syscall name and the exit codes. Additionally, we can format the output to reduce unwanted text. Finally, the expression also supports the meddling of syscalls through fault and delay injections.

10.1. General Syntax

Generally, the flag -e precedes the expression. Then, the expression is specified as a key-value pair:

-e qualifier=[!]value[,value]

Before the value, we can specify an exclamation mark to negate the value. Additionally, we can separate multiple values with a comma.

10.2. Qualifiers

The qualifier must be from the list of trace, status, signal, quiet, abbrev, verbose, raw, read, write, fault, and inject.

These qualifiers can be loosely grouped into the following group based on their respective functionality:

  • filtering (trace, status, signal, quiet)
  • output formatting (abbrev, verbose, raw)
  • syscalls tampering (fault, inject)
  • file descriptor data dumping (read, write)

10.3. Values

The value in the expression represents a qualifier-dependent value. Additionally, we can separate multiple values for a single key with commas.

11. Filtering With Expression

11.1. Filtering Output by Syscall Name

Using the filtering qualifiers, we can reduce the output of strace. For example, we can output only fstat syscalls:

$ strace -e trace=fstat whoami 
fstat(3, {st_mode=S_IFREG|0644, st_size=9394, ...}) = 0
fstat(3, {st_mode=S_IFREG|0755, st_size=2029224, ...}) = 0
-TRUNCATED-
fstat(3, {st_mode=S_IFREG|0644, st_size=494, ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x1), ...}) = 0
baeldung
+++ exited with 0 +++

Similarly, we can display every syscall except for fstat using negation:

$ strace -e trace=!fstat whoami

11.2. Filtering Output by Return Status

Using the status qualifier, we can filter the syscalls by the return status. For example, we can display only syscalls that aren’t successful:

$ strace -e status=!successful whoami

The valid values for the status qualifiers are: successful, failed, unfinished, unavailable, and detached.

Additionally, we can combine multiple statuses with a comma. For instance, we can display syscalls that exit with the status unfinished or unavailable:

$ strace -e status=unfinished,unavailable whoami

11.3. Filtering Output by Signal

Other than syscalls, strace also records any signals received by the process. By default, these recorded signals are displayed on the output along with the syscalls. We can filter these signals using the qualifier signal. For instance, we can trace only the signal SIGBUS:

$ strace -e signal=SIGBUS whoami

The valid values for signal qualifiers are all the standard Linux signals.

11.4. Suppressing Additional Informational Message

In addition to the syscalls and signal, strace also displays some informational messages. For example, strace prints a message whenever a process exit:

+++ exited with 0 +++

To suppress these messages, we can use the qualifier quiet:

$ strace -e quiet=exit whoami

The messages we can suppress with this qualifier including attached, exit, path-resolution, personality, thread-execve, superseded.

12. Formatting the Output

12.1. Dereferencing Syscall Arguments

Using the qualifier verbose, we can make strace display the syscall arguments in their dereferenced form. Displaying the arguments in their dereferenced form is more helpful than the pointer value. This is why all the syscall arguments are dereferenced by default. In other words, the default value for the verbose qualifier is all.

To see the qualifier in action, we can first disable the verbose expression for all the syscalls:

$ strace -e verbose=none whoami
execve(0x7fff4e3efdc0, 0x7fff4e3f1100, 0x7fff4e3f1110) = 0
brk(0)                                  = 0x55970f3be000
arch_prctl(0x3001, 0x7fff2630af10)      = -1 EINVAL (Invalid argument)
-TRUNCATED-

By setting the expression verbose=none, strace displays the arguments pointer instead of its dereferenced structures.

12.2. Abbreviating Syscall

The dereferenced arguments of some syscalls can be very long, cluttering the output. Therefore, strace apply the expression abbrev=all as its default behavior. For example, the record of fstat syscall is abbreviated if we do not override the default expression for abbrev:

$ strace -e whoami
-TRUNCATED-
fstat(3, {st_mode=S_IFREG|0644, st_size=971, ...}) = 0
-TRUNCATED-

As we can see, the rest of the argument values are abbreviated with ellipses.

Let’s now disable the abbrev expression for all the syscalls:

$ strace -e abbrev=none whoami
-TRUNCATED-
fstat(1, {st_dev=makedev(0, 0x76), st_ino=4, st_mode=S_IFCHR|0620, st_nlink=1, st_uid=0, st_gid=5, st_blksize=1024, st_blocks=0, st_rdev=makedev(0x88, 0x1), st_atime=1614501905 /* 2021-02-28T08:45:05.353968000+0000 */, st_atime_nsec=353968000, st_mtime=1614501905 /* 2021-02-28T08:45:05.353968000+0000 */, st_mtime_nsec=353968000, st_ctime=1614343760 /* 2021-02-26T12:49:20.354968000+0000 */, st_ctime_nsec=354968000}) = 0
-TRUNCATED-

By disabling the abbrev expression for all the syscalls, strace is displaying the argument structure in its entirety.

12.3. Displaying Undecoded Arguments

With the raw qualifier, strace displays the syscall arguments address instead of its underlying char values.

By default, all the arguments are decoded into their respective character representation:

$ strace whoami
execve("/usr/bin/whoami", ["whoami"], 0x7ffd9bc11880 /* 12 vars */) = 0
brk(NULL)                               = 0x5574c7e68000
-TRUNCATED-

Let’s set the expression raw=execve:

$ strace -e raw=execve whoami
execve(0x7ffec1f3d6c0, 0x7ffec1f3ea00, 0x7ffec1f3ea10) = 0
brk(NULL)                               = 0x5574c7e68000
-TRUNCATED-

As we can observe, the pointer of the arguments for execve syscall is displayed instead of their actual value.

13. Syscall Tampering

One of the most powerful features of strace expression is its ability to alter the syscall behavior using inject and fault qualifiers. This feature is very similar to the mocking of methods in unit tests. For example, we can mock a syscall such that it always returns an error whenever it has been invoked by the command. The ability to tamper with syscalls is useful for experimenting with the program under different conditions.

Generally, the expression for inject can be expressed as:

--inject=syscall_set[:error=errno|:retval=value][:signal=sig][:syscall=syscall][:delay_enter=delay][:delay_exit=delay][:when=expr]

Note that error and retval are mutually exclusive. In other words, if we’re injecting error for a set of syscall, we cannot inject retval for the same set of syscalls.

As we can infer from the general expression, the inject qualifier is very flexible. Particularly, we can insert some delay when the syscall is entered or exit using delay_enter and delay_exit. Additionally, we can control the injection through the when subexpression.

On the other hand, the qualifier fault is a specific case of the qualifier inject. Concretely, the fault qualifier can only be used for injecting fault. Therefore, we’ll only look at the inject qualifier in this article.

13.1. Injecting Fault Into Syscalls

Let’s inject faults into the fstat syscall. Particularly, we’ll replace the fstat invocation with an EPERM exit code:

$ strace -e inject=fstat:error=EPERM whoami
execve("/usr/bin/whoami", ["whoami"], 0x7ffc481220a0 /* 12 vars */) = 0
-TRUNCATED-
fstat(3, 0x7ffd9337e640)                = -1 EPERM (Operation not permitted) (INJECTED)
-TRUNCATED-

Notice how the fstat syscall always returns with exit status -1 EPERM. Additionally, strace annotate the exit status with the text (INJECTED) to make it clear that the error is injected.

13.2. Controlling When Does the Faults Get Injected

Instead of injecting the fault on every invocation, we can further control the injection occurrence using the when subexpression. For example, we can inject the fault error EPERM on fstat on its 2nd invocation only:

$ strace -e inject=fstat:error=EPERM:when=2 whoami

In the example above, strace only injects the fault on the 2nd invocation of fstat. If there’s a 3rd and 4th invocation, no faults will be injected. To inject the fault on 2nd invocation and onwards, we can use the plus symbol:

$ strace -e inject=fstat:error=EPERM:when=2+ whoami

Finally, we can also make the injection follow a specific step size. For instance, we can inject the fault on the 2nd invocation and a subsequent invocation with a step size of 2. Concretely, we would like to inject the fault on the 2nd, 4th, 6th invocations, and so on.

To do that, we’ll specify a step size of 2 after the plus symbol:

$ strace -e inject=fstat:error=EPERM:when=2+2 whoami

13.3. Introducing Delays in Syscalls

Using the qualifier delay_enter, we can inject some delay prior to the syscall invocation. Similarly, we can inject delay after the syscall returns using the qualifier delay_exit. Both qualifiers accept a time value on a microseconds scale.

For example, we can induce 2 seconds (2000000 microseconds) delay right before the command invokes fstat syscall:

$ strace -e inject=fstat:delay_enter=2000000 whoami

On the other hand, to inject the delay after the syscall, we’ll use the expression delay_exit:

$ strace -e inject=fstat:delay_exit=2000000 whoami

The important distinction between the 2 examples is the sequence of delay injection. In the first example, the 2 seconds delay is introduced before the fstat is called. On the contrary, the 2nd example injects the 2 seconds delay after the fstat syscall returns.

14. File Descriptor Data Dumping

14.1. Dumping File Descriptors’ Data on Every Input Activity

Using the qualifier read, we can dump the hexadecimal data of any file descriptors on every input activity. Let’s dump the data whenever there’s input activity on file descriptor 3:

$ strace -e read=3 whoami

The output will now contain blocks of hexadecimal data whenever there’s a read syscall on file descriptor 3.

14.2. Dumping File Descriptors’ Data on Every Output Activity

We can display the data on a file descriptor whenever there’s a write syscall invoked on it. For example, to dump the data of file descriptor 5 on every output activity, we’ll use the expression write=5:

$ strace -e write=5 whoami

15. Conclusion

In this article, we’ve looked at strace as a diagnostic tool. We’ve started with a basic introduction to strace. Then, the article demonstrated the flags for attaching strace to a running process, modifying the environment variable, and getting the statistics.

In the subsequent section, we’ve looked thoroughly at the strace expression. We’ve started with the expression for filtering output. Then, we’ve shown the expressions for formatting the output. Besides that, we’ve also explored the expression for tampering with syscall. Finally, we’ve introduced the expression for dumping data of file descriptors.