1. Introduction
Despite the advent of containerization, virtualization continues to be a very relevant technology. Because of this, understanding how a virtual environment impacts performance can be critical, especially when hosting applications in the cloud.
In this tutorial, we discuss the difference between a couple of ways we can interface main storage in a virtualized environment. First, we define an important term in virtual environments. Next, we look at the way it relates to storage, distinguishing between two devices with similar functions but different names. Finally, we explore device naming in general.
For brevity, we stick to the term disk for storage in general, despite the fact that modern storage very rarely contains disks.
We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.
2. Paravirtualization
Paravirtualization happens when the hypervisor receives so-called hypercalls from the guest operating system (OS) of a virtual machine.
Hypercalls are similar to system calls in terms of their function as direct controlling signals on a low level. The communication between guest and hypervisor avoids the need to emulate hardware for the former. In fact, paravirtualization allows the guest to see the actual devices but requires modification to the guest OS or its drivers to do so.
For example, let’s use lshw to examine our network devices in a non-virtualized environment:
$ lshw -class network
*-network
description: Ethernet interface
product: NetXtreme II BCM5709 Gigabit Ethernet
vendor: Broadcom Corporation
physical id: 0
bus info: pci@0000:02:00.0
logical name: eth0
version: 20
serial: 00:04:76:06:66:10
size: 1GB/s
capacity: 1GB/s
width: 64 bits
clock: 33MHz
capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical
tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=bnx2 driverversion=1.7.5 duplex=full
firmware=5.0.1 NCSI 2.0.6 ip=192.168.66.6 latency=0 link=yes multicast=yes port=twisted pair speed=1GB/s
resources: irq:16 memory:f3000000-f5ffffff memory:d6100000-d610ffff(prefetchable)
The driver=bnx2 matches the product: NetXtreme II BCM5709 Gigabit Ethernet line, which in turn is an actual external device. While emulating such devices can be and is sometimes done, virtualized environments without paravirtualization often prefer generic components:
$ lshw -class network
[...]
*-network:2
description: Ethernet interface
physical id: 3
logical name: eth0
serial: 00:15:5d:06:66:10
size: 10Gbit/s
capabilities: ethernet physical
configuration: autonegotiation=off broadcast=yes driver=hv_netvsc driverversion=5.10.102.1-microsoft-standard-W
duplex=full firmware=N/A ip=172.16.66.10 link=yes multicast=yes speed=10Gbit/s
Here, the pure amount of information, in contrast with our earlier example, is a strong hint. Further, the driver hv_netvsc is a known Hyper-V network driver, its version mentions microsoft, and there is no product name at all.
Still, we can have all this information for a component and still be in a virtual environment. Let’s see what that means when it comes to storage performance.
3. Storage Block Device Drivers
Indeed, when it comes to virtualization, drivers are critical. They dictate how the guest OS sees devices and communicates with them. This can mean a great deal for performance.
3.1. /dev/sda and /dev/hda
Of course, we already know what **/dev/sda and /dev/hda are: the first [d]isks in a Linux system** (a being the first letter in the alphabet). Yet, we may not be able to tell whether they are virtual or not just from their block device name.
In sda, s stands for SCSI, while the h in hda simply references old IDE [h]ard disks. Currently, both /dev/sd* and /dev/hd* are catch-all devices, so they play to the principle of being generic that hypervisors usually follow.
Consequently, the drivers for both are also generic, and they initiate a lot of real-world actions:
- initialize disks on boot
- move the magnetic head of a hard disk
- deal with controller feedback
Naturally, the duplication of such signals from the guest to the hypervisor and then – the host, introduces unnecessary overhead. Still, hypervisors do a very good job of emulating storage devices, in particular, since they are one of the slowest system components. Yet, modern storage is much faster, so virtual drivers may decrease their performance noticeably.
Let’s see how we can avoid that.
3.2. /dev/vda
While /dev/sd* and /dev/hd* are relatively covert, /dev/vda describes itself: [v]irtual [d]isk. Importantly, it’s not only in the name for us to read but in the code for the system to understand – vd* is [v]irtualization-aware.
In particular, /dev/vda uses drivers like virtio, which only communicate the essentials to the hypervisor:
- operation: read or write
- location: where do we write or read
- (on writes) data: what do we write
Basically, this is done via the hypercalls we discussed earlier. When using them, we skip the burden of emulated guest OS system calls. Often, this improves performance dramatically.
Still, how do the names sd* and vd* even come about?
4. Device Rules
In Linux, the /lib/udev/rules.d/ directory contains rule files with specific instructions for every (expected) device in a system:
$ ls /lib/udev/rules.d/
50-firmware.rules 60-libgphoto2-6.rules 60-serial.rules 71-seat.rules 80-net-setup-link.rules
50-udev-default.rules 60-libopenni2-0.rules 64-btrfs.rules 73-seat-late.rules 85-hdparm.rules
55-dm.rules 60-libsane1.rules 64-xorg-xkb.rules 73-special-net-names.rules 90-libinput-fuzz-override.rules
60-autosuspend.rules 60-persistent-alsa.rules 65-libwacom.rules 75-net-description.rules 95-dm-notify.rules
60-block.rules 60-persistent-input.rules 70-joystick.rules 75-probe_mtd.rules 96-e2scrub.rules
60-cdrom_id.rules 60-persistent-storage-dm.rules 70-mouse.rules 78-sound-card.rules 99-libsane1.rules
60-drm.rules 60-persistent-storage.rules 70-power-switch.rules 80-debian-compat.rules 99-systemd.rules
60-evdev.rules 60-persistent-storage-tape.rules 70-touchpad.rules 80-drivers.rules
60-fido-id.rules 60-persistent-v4l.rules 70-uaccess.rules 80-ifupdown.rules
60-input-id.rules 60-sensor.rules 71-ipp-usb.rules 80-libinput-device-groups.rules
As we can see from their filenames, each rule file is responsible for a different area.
The kernel detects devices by their parameters and interface, matches them to a given type, and performs various activities based on the results:
- initial system setup
- device naming
- device initialization
- environment preparation
Since names are one of the main device attributes, knowing where they come from and why can shed light on key features. For example, we can check the rules for persistent storage devices:
$ cat /lib/udev/rules.d/60-persistent-storage.rules
[...]
# virtio-blk
KERNEL=="vd*[!0-9]", ATTRS{serial}=="?*", ENV{ID_SERIAL}="$attr{serial}", SYMLINK+="disk/by-id/virtio-$env{ID_SERIAL}"
KERNEL=="vd*[0-9]", ATTRS{serial}=="?*", ENV{ID_SERIAL}="$attr{serial}", SYMLINK+="disk/by-id/virtio-$env{ID_SERIAL}-part%n"
[...]
# legacy virtio-pci by-path links (deprecated)
KERNEL=="vd*[!0-9]", ENV{ID_PATH}=="pci-*", SYMLINK+="disk/by-path/virtio-$env{ID_PATH}"
KERNEL=="vd*[0-9]", ENV{ID_PATH}=="pci-*", SYMLINK+="disk/by-path/virtio-$env{ID_PATH}-part%n"
[...]
# ATA
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", SUBSYSTEMS=="scsi", ATTRS{vendor}=="ATA",
IMPORT{program}="ata_id --export $devnode"
# ATAPI devices (SPC-3 or later)
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", SUBSYSTEMS=="scsi", ATTRS{type}=="5",
ATTRS{scsi_level}=="[6-9]*", IMPORT{program}="ata_id --export $devnode"
# Run ata_id on non-removable USB Mass Storage (SATA/PATA disks in enclosures)
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", ATTR{removable}=="0", SUBSYSTEMS=="usb",
IMPORT{program}="ata_id --export $devnode"
# Fall back usb_id for USB devices
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", SUBSYSTEMS=="usb", IMPORT{builtin}="usb_id"
# SCSI devices
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id --export
--whitelisted -d $devnode", ENV{ID_BUS}="scsi"
KERNEL=="cciss*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id
--export --whitelisted -d $devnode", ENV{ID_BUS}="cciss"
KERNEL=="sd*|sr*|cciss*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*",
SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}"
KERNEL=="sd*|cciss*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*",
SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n"
# PMEM devices
KERNEL=="pmem*", ENV{DEVTYPE}=="disk", ATTRS{uuid}=="?*", SYMLINK+="disk/by-id/pmem-$attr{uuid}"
# FireWire
KERNEL=="sd*[!0-9]|sr*", ATTRS{ieee1394_id}=="?*", SYMLINK+="disk/by-id/ieee1394-$attr{ieee1394_id}"
KERNEL=="sd*[0-9]", ATTRS{ieee1394_id}=="?*", SYMLINK+="disk/by-id/ieee1394-$attr{ieee1394_id}-part%n"
[...]
Here, we can see that virtio handles /dev/vd* devices. Meanwhile, /dev/sd* is very universal when it comes to persistent storage, so it does not tell us much about the device itself.
5. Summary
In this article, we discussed the differences between /dev/sda and /dev/vda.
In conclusion, while both relate to storage, /dev/vd* devices usually have better performance when compared with /dev/sd* or /dev/hd*.