1. Introduction
The Graphic Processing Unit (GPU) is a powerful device whose applications go far beyond supporting graphics. To some extent, we can regard the GPU as an independent computation unit inside our computer. Therefore, we need special tools for monitoring it.
In this tutorial, we’ll learn commands to report the NVIDIA GPU state.
2. How to Make GPU a Bit Hotter
Before we monitor the GPU, let’s devise a way to load it. Because the GPU discerns the graphic and compute modes of use, we must find programs that run in each of them.
Assuming that we installed the CUDA toolbox with the nvcc compiler, we can utilize NVIDIA code examples to load GPU in the compute mode. Let’s download the code from GitHub and extract it in the ~/prj/cuda folder. Afterward, we can move to the watershedSegmentationNPP folder:
$ cd ~/prj/cuda/cuda-samples-master/Samples/4_CUDA_Libraries/watershedSegmentationNPP
This application implements the watershed segmentation algorithm, which is widely used in medical imaging. Then, let’s build the program with make:
$ make
As a single run of this program is short, we need to execute it in the infinite loop:
$ while true; do ./watershedSegmentationNPP > \dev\null 2>&1; done
To add pressure in the graphic mode, let’s install the FlightGear flight simulator:
$ sudo apt install flightgear
Then, we can take a trip over the Keflavik airport on a Beechcraft Staggerwing:
3. The nvidia-smi Tool
nvidia-smi is an NVIDIA tool to monitor and control the GPU(s). For the current state of GPU, let’s issue:
$ nvidia-smi
First, we’re provided with the GPU details. Then comes the current state of the GPU, such as fan utilization, temperature, or power draw, to name only a few. In the lower panel, the processes running on the GPU are shown, where fgfs stands for FlightGear. Note the ‘Type’ column, which informs if the task uses the graphic or compute mode, denoted by G or C, respectively. Additionally, a process that runs in both modes simultaneously is marked by G+C.
If we want to track the GPU state over time, let’s use one of the -l or -lms options. The former accepts a refresh interval in seconds, while the latter takes milliseconds:
$ nvidia-smi -l 2 # snapshot every two seconds
$ nivida-smi -lms 250 # snapshot every 250 milliseconds
4. The nvtop Monitor
The nvtop program is the GPU equivalent of a well-known top command. We need to install it, e.g., with apt on Ubuntu:
$ sudo apt install nvtop
The command provides a nice, real-time summary of the GPU state and processes. The plot shows the GPU workload and memory utilization:
$ nvtop
The plot can be disabled with the -p option. Next, we can change the refresh interval by employing the -d option. It uses a tenth of a second as a time unit. So, to refresh every half a second, we should issue:
$ nvtop -d 5
Finally, note that nvtop claims support for GPUs from brands other than NVIDIA.
5. The gpustat Command
Another tool to monitor GPU is gpustat. On Ubuntu, we can install it with apt:
$ sudo apt install gpustat
Let’s start gpustat with the –watch option for real-time statistics:
$ gpustat --watch
This basic output provides us with the GPU index and GPU name first. Then comes figures: temperature, GPU workload, and memory usage. For more detail, we can use the options:
$ gpustat -FP -ucp --watch
The -F option stands for fan speed, the blue zero-valued number in the screenshot above. Next, we use -P to display power consumption in magenta. With –u, –c, and –p, we list GPU processes. These switches mean user, command, and process ID, respectively.
6. The nvitop Utility
Yet another GPU monitor is nvitop. Let’s install it with pip:
$ pip install --upgrade nvitop
Then, we’ll obtain an elegant, real-time output by issuing:
$ nvitop
The type of the GPU process, G or C, is shown immediately after its PID.
7. More Options to nvidia-smi
We can check many GPU environment features using options for nvidia-smi.
7.1. Querying Properties
Let’s use the -q options to obtain information concerning the GPU. When we issue it alone, we get a long list of features:
$ nvidia-smi -q
==============NVSMI LOG==============
Timestamp : Sun Mar 24 08:52:45 2024
Driver Version : 550.54.14
CUDA Version : 12.4
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA GeForce GTX 960
# ...
FB Memory Usage
Total : 2048 MiB
Reserved : 59 MiB
Used : 341 MiB
Free : 1647 MiB
# ...
Temperature
# ...
GPU Power Readings
# ...
Max Clocks
# ...
# ...
Processes
# ...
Instead of wading through this list, we can reach a specific feature by adding the -d option. It takes one or more keywords. So, let’s target the temperature and power:
$ nvidia-smi -q -d TEMPERATURE,POWER
7.2. The ‘query’ Options
We can explicitly refer to a specific property using a bunch of options with names starting with ‘query’. For example, –query-gpu= examines our GPU. So, let’s check the amount of its total and free memory:
$ nvidia-smi --query-gpu=memory.total,memory.free --format=csv
memory.total [MiB], memory.free [MiB]
2048 MiB, 1621 MiB
The memory.total and memory.free labels correspond to the tree shown by the general nvidia-smi -q command. For information about supported features, we can ask nvidia-smi for help:
$ nvidia-smi --help-query-gpu
Another interesting query switch is –query-compute-apps= to list the active computational tasks:
$ nvidia-smi --query-compute-apps=pid,process_name --format=csv
pid, process_name
58478, ./watershedSegmentationNPP
Once again, for more details, let’s use the command’s help:
$ nvidia-smi --help-query-compute-apps
Finally, let’s add that we can combine most of the nvidia-smi options with -l or -lms to obtain a real-time view. As an example, let’s display the memory statistics every second:
$ nvidia-smi -l 1 --query-gpu=memory.total,memory.free --format=csv
8. More on Processes
All applications presented so far display information about GPU processes. However, it’s sometimes useful to make an independent check. To do so, we can list the processes which use the nvidia device files in the /dev folder. So, let’s use the fuser command:
$ sudo fuser -v /dev/nvidia*
9. Conclusion
In this tutorial, we learned how to examine the NVIDIA GPU. First, we check the GPU using the NVIDIA-provided nvidia-smi command. Then, we presented a list of text-based monitor programs, nvtop, gpustat,** and nvitop.
Next, we focused on detailed queries of GPU properties with nvidia-smi. Finally, via device files, we conducted an alternative search for processes that used GPU.
We performed all tests after loading the GPU with both graphic and computation tasks.