1. Introduction
“Container” is a buzzword in the industry and is frequently cited in lists of hot new topics. Essentially, it’s easier and faster for the organization to build and deploy the application anywhere. Images are the backbone of the containers, so in this article, let’s look under the hood at how they are stored inside the host machine.
Without any further ado, let’s get into the nitty-gritty of it.
2. Docker Images
A Docker image is a cut-down version of the operating system files and dependencies required to run an application or service.
From a system administrator’s point of view, we can think of the images as VM templates, much like a halted VM. Similarly, a Docker image is analogous to a stopped container; thus, images are referred to as build-time constructs.
Typically, the containers are lightweight and make it fast to deploy the application or service anywhere. Hence, the images are generally very small by stripping away all non-essential parts. For instance, Docker images don’t have their own kernel. Instead, they share the kernel with the host machine. Images are not shipped with four or five shells; instead, they come with a single or no shell. Consequently, we sometimes refer to the image as a sufficient operating system.
The official Ubuntu OS Image 22.04 LTS has a file size of 3.6GB, but the containerized version of the same image, which eliminates all extraneous components, is only 77.8 MB. It’s almost a 98% reduction in size and subsequent savings in hardware resource utilization:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 22.04 216c552ea5ba 10 days ago 77.8MB
2.1. Image Layers
Generally, we can pull the images from the DockerHub registry, but others do exist. We use the docker pull command to get the images from the DockerHub:
$ docker pull node:latest
latest: Pulling from library/node
f606d8928ed3: Pull complete
47db815c6a45: Pull complete
bf4849400000: Pull complete
a572f7a256d3: Pull complete
8f7d05258955: Pull complete
3a459f9ab1c6: Pull complete
c37bcb1df089: Pull complete
bf0ef0f2bfc7: Pull complete
9c17ea02add5: Pull complete
Digest: sha256:9d8a6466c6385e05f62f8ccf173e80209efb0ff4438f321f09ddf552b05af3ba
Status: Downloaded newer image for node:latest
docker.io/library/node:latest
As we see, the image is downloaded as multiple layers from the DockerHub blob store. The lines marked “pull complete” denote the layers. Here, our image has nine read-only layers stacked on top of each other to form a single, cohesive image object. Again, we use the docker images command to get the list of images available in our local repository:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
node latest 35ff1df466e8 9 days ago 991MB
ubuntu 22.04 216c552ea5ba 10 days ago 77.8MB
Yet another way to see the layers of an image is by using the docker inspect command. Now let’s inspect the node:latest image and get the SHA256 hashes of all layers:
$ docker inspect 35ff1df466e8
[
{
"Id": "sha256:35ff1df466e834b2408d56faca095d16dc4002cbd3e4c46c15c72e2aaf18afaf",
"RepoTags": [
"node:latest"
],
...
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:8e079fee21864e07daa88efcf74f23ad5ade697c06417d0c04a45dfe580ab7f3",
"sha256:955c9335e041ebf1840e5d9779a217a5957561086148c7da89bdd4000acd62c4",
"sha256:186da837555d4da0f772d025f29940370be7a464c2b92871a166941cde9fca3d",
"sha256:288cf3a46e320aa79274f52d3ce609be1a9f67bab6d34305231ddc7f40c6a261",
"sha256:75ba0293749684938f4de0f9c9c9deb7d200399c1fb129f3d0147e5772effa67",
"sha256:ff5b3ba76c67f918a022a8be0dc412978d03a7e876ca094291ddb6bda9ba8e16",
"sha256:7b882706e16e206df8c8d9fb69869b2b83912a37b78c8cd463608a693df37c36",
"sha256:40eaad54c8b1838dcd3ae5e10fd7bc700e2176cad488328a47316d0d25167b6e",
"sha256:73ebbf1d19781242c9e3d9106ff8bbd7899bc06efab691e622ab2e8a4466d5d7"
]
},
...
]
Here, the first line under the layer section shows the base layer; as we add content with the build commit, the new layers are formed on top of it. The base layer for the node:latest image is 8e079fee21. In the following section, let’s examine the specifics of Docker image storage in more detail.
3. Image Storage
Generally, Docker uses storage drivers to store the image layers efficiently. It manages the storage and administration of images and containers on our Docker host. Further, storage drivers also know the specifics of how these layers communicate with one another.
Now let’s see where and what is stored in these image layers.
Let’s first get the storage driver information from the host machine using the docker info command. Furthermore, Docker engine additionally supports a variety of storage drivers, including overlay2, fuse-overlay2, btrfs, zfs, aufs, overlay, devicemapper, and vfs:
$ docker info | grep -i "Storage Driver"
Storage Driver: overlay2
Next, let’s obtain information on the Docker root directory, which houses the majority of Docker’s data:
$ docker info | grep "Root Dir"
Docker Root Dir: /var/lib/docker
3.1. Image Internals – Deep Dive
Identifying the Docker files and their content in the root folder is not straightforward. But let’s try to decipher it easily with the below six simple steps:
- Get the Image ID from the Docker host local repository using the docker images command
- Extract the layerID or diffID information using the docker inspect command under the RootFS section
- Calculate the chainID using the current and previous layer SHA256 values. The calculation method is below
If the layer is the lowest layer with any parent layer, then diffID = chainID
Otherwise, chainID(n) = sha256sum [ chainID(n-1), diffID(n) ] - Using chainID, navigate to /var/lib/docker/image/overlay2/ to obtain the cacheID. The cacheID helps to get the actual contents that are indexed to the respective layers
- With the help of cacheID, let’s go to the storage driver path [ /var/lib/docker/overlay2/ ] to get the actual layer content
- Navigate to the diff directory to get all the files and directories of that layer
Here, let’s get the image layers using the docker inspect command for node:latest image id. Usually, the first line represents the base layer of the image. The other lines depict the subsequent image layers in an ordered way:
$ docker inspect 35ff1df466e8
[
...
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:8e079fee21864e07daa88efcf74f23ad5ade697c06417d0c04a45dfe580ab7f3",
"sha256:955c9335e041ebf1840e5d9779a217a5957561086148c7da89bdd4000acd62c4",
"sha256:186da837555d4da0f772d025f29940370be7a464c2b92871a166941cde9fca3d",
"sha256:288cf3a46e320aa79274f52d3ce609be1a9f67bab6d34305231ddc7f40c6a261",
"sha256:75ba0293749684938f4de0f9c9c9deb7d200399c1fb129f3d0147e5772effa67",
"sha256:ff5b3ba76c67f918a022a8be0dc412978d03a7e876ca094291ddb6bda9ba8e16",
"sha256:7b882706e16e206df8c8d9fb69869b2b83912a37b78c8cd463608a693df37c36",
"sha256:40eaad54c8b1838dcd3ae5e10fd7bc700e2176cad488328a47316d0d25167b6e",
"sha256:73ebbf1d19781242c9e3d9106ff8bbd7899bc06efab691e622ab2e8a4466d5d7"
]
},
...
]
Using the chainID formula, let’s calculate the chainID for layer 2, layer 3, and so on, whilst the chainID for layer 1 is the same as the diffID or layerID:
### Layer-2: chainID Calculation
$ echo -n "sha256:8e079fee21864e07daa88efcf74f23ad5ade697c06417d0c04a45dfe580ab7f3 sha256:955c9335e041ebf1840e5d9779a217a5957561086148c7da89bdd4000acd62c4" | sha256sum
b00657a91aea31613d9a8764759a8784f35a4c7ab55299bc4a9fa88d989d5c15
### Layer-3: chainID Calculation
$ echo -n "sha256:955c9335e041ebf1840e5d9779a217a5957561086148c7da89bdd4000acd62c4 sha256:186da837555d4da0f772d025f29940370be7a464c2b92871a166941cde9fca3d" | sha256sum
fa38591e75d0b112fdfa4d09798c3489e2acc4ffdb67da620cf548a69e4be2a3
...
Likewise, with the help of chainID, browse through the /var/lib/docker/image/overlay2/layerdb/shs256 directory to get the actual content index known as cacheID. Furthermore, the directory also has the parent information of the layer and its size:
$ pwd
/var/lib/docker/image/overlay2/layerdb/sha256
$ tree b00657a91aea31613d9a8764759a8784f35a4c7ab55299bc4a9fa88d989d5c15
b00657a91aea31613d9a8764759a8784f35a4c7ab55299bc4a9fa88d989d5c15
├── cache-id
├── diff
├── parent
├── size
└── tar-split.json.gz
0 directories, 5 files
$ cd /var/lib/docker/image/overlay2/layerdb/sha256/b00657a91aea31613d9a8764759a8784f35a4c7ab55299bc4a9fa88d989d5c15
$ cat cache-id
6112cbe71a05105ba0907929415b48c5b07ade33b15d0b57abb83596dcaaaac0
$ cat parent
sha256:8e079fee21864e07daa88efcf74f23ad5ade697c06417d0c04a45dfe580ab7f3
$ cat size
10696859
Lastly, locate the directory containing the retrieved cacheID by browsing the /var/lib/docker/overlay2 path. Simply navigate to the diff directory in that path to access the layer’s actual contents:
$ pwd
/var/lib/docker/overlay2
$ ls -l /var/lib/docker/overlay2/6112cbe71a05105ba0907929415b48c5b07ade33b15d0b57abb83596dcaaaac0
total 16
-rw------- 1 root root 0 Oct 14 09:07 committed
drwxr-xr-x 6 root root 4096 Oct 14 09:07 diff
-rw-r--r-- 1 root root 26 Oct 14 09:07 link
-rw-r--r-- 1 root root 28 Oct 14 09:07 lower
drwx------ 2 root root 4096 Oct 14 09:07 work
server# cd 6112cbe71a05105ba0907929415b48c5b07ade33b15d0b57abb83596dcaaaac0
$ ls
committed diff link lower work
$ tree
.
├── committed
├── diff
│ ├── etc
│ │ ├── ca-certificates.conf
│ │ ├── ethertypes
...
... output truncated ...
...
160 directories, 802 files
4. Conclusion
In summary, we examined the fundamentals of Docker images and their layers. Further, we also learned about a few key concepts, such as diffID, chainID, and cacheID, to identify the layer files and their real contents.