1. Introduction
In Linux systems, analyzing the behavior and performance of processes can be helpful in gaining a deeper understanding of running programs. For this purpose, we can profile processes to get periodic updates of performance metrics like memory or CPU usage.
Process profiling offers valuable insights into how applications perform, helping us to identify the performance bottlenecks and optimize resource utilization of our programs.
Profiling an application is an extensive topic. In this tutorial, we’ll explore a number of Linux tools and highlight their capabilities.
2. Summary of Processes With top
top provides real-time information on system activity and processes managed by the operating system.
We can configure top to get information on both system-wide overviews as well as details on a particular process. It presents information such as process IDs, thread counts, CPU, and memory use of processes.
2.1. Overview
If we want to get a general overview of processes and their details, we can use top directly:
$ top
top - 21:59:11 up 29 min, 2 users, load average: 0.00, 0.00, 0.00
Tasks: 210 total, 1 running, 209 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.4 us, 0.2 sy, 0.0 ni, 99.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 11843.1 total, 9358.9 free, 1114.4 used, 1369.8 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 10461.4 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1703 baeldung 20 0 3933300 350352 108468 S 1.3 2.9 0:07.01 gnome-shell
1392 root 20 0 6730536 122060 76748 S 0.3 1.0 0:15.76 Xorg
3879 baeldung 20 0 14724 4352 3556 R 0.3 0.0 0:00.10 top
1 root 20 0 168640 11776 8248 S 0.0 0.1 0:04.77 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
5 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 slub_flushwq
...
Let’s summarize some of the data columns above:
- PID: process ID (PID)
- PR: scheduling priority of a process
- NI: the nice value of a task impacting its priority, where negative values mean higher priority, positive values mean lower, and zero indicates no priority adjustment
- VIRT: used virtual memory
- RES: resident memory size is a portion of the virtual address space that reflects the actual physical memory currently utilized by a task.
- SHR: shared memory is a portion of resident memory that can be shared with other processes
- S: process state
Other columns like %CPU or USER are more or less self-explanatory.
2.2. Particular Process Specifics
To narrow down our approach, we can utilize the -p option with top to retrieve details of only a specific process.
For example, let’s see how we can only get the gnome-shell process with PID 1703:
$ top -p 1703
top - 22:43:10 up 1:13, 2 users, load average: 0.00, 0.02, 0.00
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.3 us, 0.2 sy, 0.0 ni, 99.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 11843.1 total, 9356.8 free, 1115.4 used, 1370.8 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 10460.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1703 baeldung 20 0 3933300 350620 108528 S 0.7 2.9 0:09.15 gnome-shell
If we need to get a number of specific processes together, we can reuse the -p option, or we can list the PIDs separated by commas. The maximum number of processes we can filter with this option is 20.
On the other hand, we can use the batch mode of top using the -b or –batch option. This mode is handy for directing the output to other programs or saving it to a file. In this mode, top continues running either until the specified iteration limit set with the -n option is reached or until it’s manually terminated:
$ top -b -p 1703 -n 1
top - 22:53:17 up 1:23, 2 users, load average: 0.00, 0.00, 0.00
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 11843.1 total, 9356.8 free, 1115.4 used, 1370.9 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 10460.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1703 baeldung 20 0 3933300 350620 108528 S 0.0 2.9 0:09.59 gnome-shell
As we can see above, the output is basically the same. top quits after printing the process-related information once as we specified with the -n option.
3. Process Related Information With ps
Another command-line tool we can utilize is ps. p**s is an important Linux utility, offering valuable insights into the currently active processes.
Unlike top, which can provide continuous updates, ps is more specialized to gather information about particular processes on demand.
To begin with, we can use the -p option to make the tool work with a process we specify. Let’s see what the command outputs about the same process we used before:
$ ps -p 1703
PID TTY TIME CMD
1703 ? 00:00:51 gnome-shell
The default output without additional options provides very limited information. We can use the -F option to increase the verbosity, but it won’t be useful either since our purpose is to get profiling-related data about a process.
However, for more targeted information, the -o option enables us to specify precisely what we require in terms of columns:
$ ps -o user,pid,thcount,priority,size,vsz,pcpu,pmem,cputime,etime,cmd -p 1703
USER PID THCNT PRI SIZE VSZ %CPU %MEM TIME ELAPSED CMD
baeldung 1703 8 20 317756 3933300 0.0 2.9 00:00:51 21:02:42 /usr/bin/gnome-shell
As we can observe from the above output, we directly get the data we request. In particular, the advantage here is that we have the flexibility to print out any column we want.
To explore the available categories we can get, we can use the L option of ps:
$ ps L
%cpu %CPU
%mem %MEM
_left LLLLLLLL
_left2 L2L2L2L2
_right RRRRRRRR
_right2 R2R2R2R2
_unlimited U
_unlimited2 U2
alarm ALARM
args COMMAND
atime TIME
...
Let’s understand some of the data we can get:
- PPID: parent process ID
- THCNT: thread count
- SIZE: approximate memory size
- VSZ: virtual memory size
Of course, we can refer to the full documentation to get the information we’re after.
4. Diving Deep Into a Particular Process Using perf
The perf tool in Linux is a powerful performance profiling tool that facilitates detailed information gathering and analysis of system, process, and program performance data.
4.1. Install perf
First, we need to ensure that perf is installed on our system. We can typically install it through our package manager.
For example, on Debian-based systems, we can install perf using apt with sudo privileges:
$ sudo apt install linux-tools-common linux-tools-$(uname -r)
linux-tools packages contain the perf tool. The uname -r command returns the kernel version so that we can install the correct package. This is important for low-level tools.
4.2. Basic Profiling
Having installed perf on our system, we can start to profile processes using the stat subcommand of perf along with the -p option to specify a process:
$ sudo perf stat -p 1703 sleep 5
Performance counter stats for process id '1703':
1.14 msec task-clock # 0.000 CPUs utilized
3 context-switches # 2.631 K/sec
0 cpu-migrations # 0.000 /sec
0 page-faults # 0.000 /sec
668248 cycles # 0.586 GHz
275191 instructions # 0.41 insn per cycle
62127 branches # 54.478 M/sec
3970 branch-misses # 6.39% of all branches
5.003091188 seconds time elapsed
From the output above, we can see that this tool provides lower-level profiling. Moreover, we leveraged the sleep subcommand to specify the sampling duration. Notably, we might need sudo privileges.
Now, let’s interpret some of the output that might seem unclear:
- task-clock: total time the CPU was executing instructions
- context-switches: number of context switches occurred, meaning the change of executed tasks managed by the operating system
- cpu-migrations: occurs when a process is moved from one core to another
- page-faults: number of accesses to a page that is not in the memory at the moment, and needs to be loaded from the disk
So, let’s see how we can preserve our findings across different sessions.
4.3. Save Profiling Data
We can use the record subcommand to capture the performance data for a particular process and save it into a file:
$ sudo perf record -g -p 1703 sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.073 MB perf.data (104 samples) ]
Above, the -g option enables the call graph profiling. This feature can become especially valuable when profiling a program in which we need to get performance details at the function level.
4.4. Printing Saved Profiling Data
The profiling data is recorded into a file called perf.data by default. We can analyze the results with the report command:
$ sudo perf report
Samples: 104 of event 'cycles', Event count (approx.): 12925696
Children Self Command Shared Object Symbol
+ 46.32% 0.00% gnome-shell libmozjs-68.so.68.6.0 [.] 0x00007fd99bd16cbd
+ 46.32% 0.00% gnome-shell libmozjs-68.so.68.6.0 [.] 0x00007fd99bd16700
+ 46.32% 0.00% gnome-shell libmozjs-68.so.68.6.0 [.] 0x00007fd99bd15c95
+ 39.57% 0.00% gnome-shell [unknown] [k] 0000000000000000
+ 33.49% 0.00% gnome-shell libgjs.so.0.0.0 [.] 0x00007fd99dd2bcf0
+ 33.49% 0.00% gnome-shell libmozjs-68.so.68.6.0 [.] JS_CallFunction
+ 22.25% 0.00% gnome-shell libmozjs-68.so.68.6.0 [.] 0x00007fd99c2bf67e
+ 20.25% 0.00% gnome-shell libmozjs-68.so.68.6.0 [.] 0x00007fd99bd106d3
+ 18.45% 0.37% gnome-shell [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe
+ 18.08% 0.00% gnome-shell [kernel.kallsyms] [k] do_syscall_64
...
As a result, we get an interface to profile our process on a deeper level.
Besides, we can view the results on the standard output with the –stdio option:
$ sudo perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 104 of event 'cycles'
# Event count (approx.): 12925696
#
# Children Self Command Shared Object Symbol
# ........ ........ ........... ............................ ........................................
#
46.32% 0.00% gnome-shell libmozjs-68.so.68.6.0 [.] 0x00007fd99bd16cbd
|
---0x7fd99bd16cbd
0x7fd99bd16700
0x7fd99bd15c95
|
|--14.09%--0x7fd99bd088f0
| 0x7fd99bd1651e
| 0x7fd99bea25a2
| 0x7fd99bd16cbd
| 0x7fd99bd16700
| 0x7fd99bd15c95
| |
| |--7.12%--0x7fd99bd0cffd
| | 0x7fd99c2bfb5f
| | 0x7fd99c2bf67e
| | 0x7fd99c223798
| | 0x7fd99c384356
| | __mprotect
| | entry_SYSCALL_64_after_hwframe
| | do_syscall_64
| | __x64_sys_mprotect
| | do_mprotect_pkey
| | mprotect_fixup
| | vma_merge
| | __vma_adjust
| |
...
Here, we can see the call graph and the object symbols. If we profile a program that is compiled with the -fno-omit-frame-pointer option, we can get more intuitive results in this call graph. However, delving into this topic is beyond the scope of this article.
5. Conclusion
In this article, we learned how we can profile processes in Linux using several tools such as top, ps, and perf. With these tools, we can gain deep insights into the processes effectively.
Equipped with this knowledge, we’re now well-prepared to explore more advanced profiling techniques and further enhance our capabilities.