1. Introduction

Despite the advent of containerization, virtualization continues to be a very relevant technology. Because of this, understanding how a virtual environment impacts performance can be critical, especially when hosting applications in the cloud.

In this tutorial, we discuss the difference between a couple of ways we can interface main storage in a virtualized environment. First, we define an important term in virtual environments. Next, we look at the way it relates to storage, distinguishing between two devices with similar functions but different names. Finally, we explore device naming in general.

For brevity, we stick to the term disk for storage in general, despite the fact that modern storage very rarely contains disks.

We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.

2. Paravirtualization

Paravirtualization happens when the hypervisor receives so-called hypercalls from the guest operating system (OS) of a virtual machine.

Hypercalls are similar to system calls in terms of their function as direct controlling signals on a low level. The communication between guest and hypervisor avoids the need to emulate hardware for the former. In fact, paravirtualization allows the guest to see the actual devices but requires modification to the guest OS or its drivers to do so.

For example, let’s use lshw to examine our network devices in a non-virtualized environment:

$ lshw -class network
  *-network
     description: Ethernet interface
     product: NetXtreme II BCM5709 Gigabit Ethernet
     vendor: Broadcom Corporation
     physical id: 0
     bus info: pci@0000:02:00.0
     logical name: eth0
     version: 20
     serial: 00:04:76:06:66:10
     size: 1GB/s
     capacity: 1GB/s
     width: 64 bits
     clock: 33MHz
     capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical
       tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
     configuration: autonegotiation=on broadcast=yes driver=bnx2 driverversion=1.7.5 duplex=full
       firmware=5.0.1 NCSI 2.0.6 ip=192.168.66.6 latency=0 link=yes multicast=yes port=twisted pair speed=1GB/s
     resources: irq:16 memory:f3000000-f5ffffff memory:d6100000-d610ffff(prefetchable)

The driver=bnx2 matches the product: NetXtreme II BCM5709 Gigabit Ethernet line, which in turn is an actual external device. While emulating such devices can be and is sometimes done, virtualized environments without paravirtualization often prefer generic components:

$ lshw -class network
[...]
  *-network:2
       description: Ethernet interface
       physical id: 3
       logical name: eth0
       serial: 00:15:5d:06:66:10
       size: 10Gbit/s
       capabilities: ethernet physical
       configuration: autonegotiation=off broadcast=yes driver=hv_netvsc driverversion=5.10.102.1-microsoft-standard-W
       duplex=full firmware=N/A ip=172.16.66.10 link=yes multicast=yes speed=10Gbit/s

Here, the pure amount of information, in contrast with our earlier example, is a strong hint. Further, the driver hv_netvsc is a known Hyper-V network driver, its version mentions microsoft, and there is no product name at all.

Still, we can have all this information for a component and still be in a virtual environment. Let’s see what that means when it comes to storage performance.

3. Storage Block Device Drivers

Indeed, when it comes to virtualization, drivers are critical. They dictate how the guest OS sees devices and communicates with them. This can mean a great deal for performance.

3.1. /dev/sda and /dev/hda

Of course, we already know what **/dev/sda and /dev/hda are: the first [d]isks in a Linux system** (a being the first letter in the alphabet). Yet, we may not be able to tell whether they are virtual or not just from their block device name.

In sda, s stands for SCSI, while the h in hda simply references old IDE [h]ard disks. Currently, both /dev/sd* and /dev/hd* are catch-all devices, so they play to the principle of being generic that hypervisors usually follow.

Consequently, the drivers for both are also generic, and they initiate a lot of real-world actions:

  • initialize disks on boot
  • move the magnetic head of a hard disk
  • deal with controller feedback

Naturally, the duplication of such signals from the guest to the hypervisor and then – the host, introduces unnecessary overhead. Still, hypervisors do a very good job of emulating storage devices, in particular, since they are one of the slowest system components. Yet, modern storage is much faster, so virtual drivers may decrease their performance noticeably.

Let’s see how we can avoid that.

3.2. /dev/vda

While /dev/sd* and /dev/hd* are relatively covert, /dev/vda describes itself: [v]irtual [d]isk. Importantly, it’s not only in the name for us to read but in the code for the system to understand – vd* is [v]irtualization-aware.

In particular, /dev/vda uses drivers like virtio, which only communicate the essentials to the hypervisor:

  • operation: read or write
  • location: where do we write or read
  • (on writes) data: what do we write

Basically, this is done via the hypercalls we discussed earlier. When using them, we skip the burden of emulated guest OS system calls. Often, this improves performance dramatically.

Still, how do the names sd* and vd* even come about?

4. Device Rules

In Linux, the /lib/udev/rules.d/ directory contains rule files with specific instructions for every (expected) device in a system:

$ ls /lib/udev/rules.d/
50-firmware.rules      60-libgphoto2-6.rules             60-serial.rules        71-seat.rules                    80-net-setup-link.rules
50-udev-default.rules  60-libopenni2-0.rules             64-btrfs.rules         73-seat-late.rules               85-hdparm.rules
55-dm.rules            60-libsane1.rules                 64-xorg-xkb.rules      73-special-net-names.rules       90-libinput-fuzz-override.rules
60-autosuspend.rules   60-persistent-alsa.rules          65-libwacom.rules      75-net-description.rules         95-dm-notify.rules
60-block.rules         60-persistent-input.rules         70-joystick.rules      75-probe_mtd.rules               96-e2scrub.rules
60-cdrom_id.rules      60-persistent-storage-dm.rules    70-mouse.rules         78-sound-card.rules              99-libsane1.rules
60-drm.rules           60-persistent-storage.rules       70-power-switch.rules  80-debian-compat.rules           99-systemd.rules
60-evdev.rules         60-persistent-storage-tape.rules  70-touchpad.rules      80-drivers.rules
60-fido-id.rules       60-persistent-v4l.rules           70-uaccess.rules       80-ifupdown.rules
60-input-id.rules      60-sensor.rules                   71-ipp-usb.rules       80-libinput-device-groups.rules

As we can see from their filenames, each rule file is responsible for a different area.

The kernel detects devices by their parameters and interface, matches them to a given type, and performs various activities based on the results:

  • initial system setup
  • device naming
  • device initialization
  • environment preparation

Since names are one of the main device attributes, knowing where they come from and why can shed light on key features. For example, we can check the rules for persistent storage devices:

$ cat /lib/udev/rules.d/60-persistent-storage.rules
[...]
# virtio-blk
KERNEL=="vd*[!0-9]", ATTRS{serial}=="?*", ENV{ID_SERIAL}="$attr{serial}", SYMLINK+="disk/by-id/virtio-$env{ID_SERIAL}"
KERNEL=="vd*[0-9]", ATTRS{serial}=="?*", ENV{ID_SERIAL}="$attr{serial}", SYMLINK+="disk/by-id/virtio-$env{ID_SERIAL}-part%n"
[...]
# legacy virtio-pci by-path links (deprecated)
KERNEL=="vd*[!0-9]", ENV{ID_PATH}=="pci-*", SYMLINK+="disk/by-path/virtio-$env{ID_PATH}"
KERNEL=="vd*[0-9]", ENV{ID_PATH}=="pci-*", SYMLINK+="disk/by-path/virtio-$env{ID_PATH}-part%n"
[...]
# ATA
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", SUBSYSTEMS=="scsi", ATTRS{vendor}=="ATA",
  IMPORT{program}="ata_id --export $devnode"

# ATAPI devices (SPC-3 or later)
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", SUBSYSTEMS=="scsi", ATTRS{type}=="5",
  ATTRS{scsi_level}=="[6-9]*", IMPORT{program}="ata_id --export $devnode"

# Run ata_id on non-removable USB Mass Storage (SATA/PATA disks in enclosures)
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", ATTR{removable}=="0", SUBSYSTEMS=="usb",
  IMPORT{program}="ata_id --export $devnode"

# Fall back usb_id for USB devices
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", SUBSYSTEMS=="usb", IMPORT{builtin}="usb_id"

# SCSI devices
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id --export
  --whitelisted -d $devnode", ENV{ID_BUS}="scsi"
KERNEL=="cciss*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id
  --export --whitelisted -d $devnode", ENV{ID_BUS}="cciss"
KERNEL=="sd*|sr*|cciss*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*",
  SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}"
KERNEL=="sd*|cciss*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*",
  SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n"

# PMEM devices
KERNEL=="pmem*", ENV{DEVTYPE}=="disk", ATTRS{uuid}=="?*", SYMLINK+="disk/by-id/pmem-$attr{uuid}"

# FireWire
KERNEL=="sd*[!0-9]|sr*", ATTRS{ieee1394_id}=="?*", SYMLINK+="disk/by-id/ieee1394-$attr{ieee1394_id}"
KERNEL=="sd*[0-9]", ATTRS{ieee1394_id}=="?*", SYMLINK+="disk/by-id/ieee1394-$attr{ieee1394_id}-part%n"
[...]

Here, we can see that virtio handles /dev/vd* devices. Meanwhile, /dev/sd* is very universal when it comes to persistent storage, so it does not tell us much about the device itself.

5. Summary

In this article, we discussed the differences between /dev/sda and /dev/vda.

In conclusion, while both relate to storage, /dev/vd* devices usually have better performance when compared with /dev/sd* or /dev/hd*.