1. Introduction
Containers are just a concept for isolating a part of the operating system (OS) as a separate environment. Docker takes the idea of containers and turns them into a manageable unit with both metadata and data associations. These units are Docker images: custom static files with all necessary components to get the container up and running around a given application.
In this tutorial, we explore what comprises a Docker image, as well as how we can generate, explore, and compare images via dive and container-diff. First, we go through the structure of a Docker container image and build a custom one. Next, we turn to dive and its functionality with practical examples. Finally, we go over container-diff for image analysis and comparison.
We tested the code on Debian 12 (Bookworm) with GNU Bash 5.2.15 and Docker 20.10.24. Unless otherwise specified, it should work in most POSIX-compliant environments.
2. Docker Container Image
Container images comprise all the information required to run an isolated environment for an application:
- code
- libraries
- dependencies
- runtime
Organizations like the Open Container Initiative (OCI) and the Cloud-Native Computing Foundation (CNCF) attempt to specify the format and features for containers as open standards.
Since Docker stands behind OCI, we mainly concern ourselves with the image-spec specification, which defines the format for Docker images. Critically, this data isn’t filesystem-agnostic when it comes to Docker.
2.1. Layered Structure
In fact, Docker defines sets of files in read-only filesystem layers and stores them within a Docker image. This happens via three elements:
- manifest (JSON): high-level manifest that points to more specific manifests, describing the image and each layer
- configuration (JSON): metadata, root filesystem differences, and history of image build
- layer set: actual data
Notably, the first layer is usually a minimal base called the parent. Often, we use a ready-made open-source image such as debian, ubuntu, alpine, and others already available on sites like DockerHub.
Similar to virtual machine snapshots or the journal mechanism of native Linux filesystems, each following layer represents a modification to that base layer. For example, we might want to run several steps:
- get parent (base) image as start of new image
- create a new path in an image
- copy host data to image
- download package updates to the image
- perform installation
- set the main executable
Let’s convert the steps to a Dockerfile:
$ cat Dockerfile
FROM debian:latest
RUN mkdir --parents /home/baeldung/
COPY file /home/baeldung/file
RUN apt-get update
RUN apt-get install -y vim
CMD ["vim", "--version"]
Here, we can see all six steps. Notably, each step creates a layer. Some layers, such as the command history, are temporary, while most that cause filesystem differences remain as separate instances. When the time comes to start the container, a filesystem overlay seamlessly enables writing to all layers.
2.2. Optimization
As an example, every RUN command operates within a separate environment and generates a new layer.
Because of this, we can optimize a Dockerfile for speed and size of the final image:
$ cat Dockerfile
FROM debian:latest
RUN mkdir --parents /home/baeldung/ && \
apt-get update && \
apt-get install -y vim
COPY file /home/baeldung/file
CMD ["vim", "--version"]
Now, we generate only three layers and decrease the execution time since the context of the single RUN command is the same for all shell operations.
2.3. Building Image
Let’s build the image as repox:tax:
$ docker build . --tag repox:tax
[+] Building 11.2s (10/10) FINISHED
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 181B 0.0s
=> [internal] load metadata for docker.io/library/debian:latest 1.9s
=> [1/5] FROM docker.io/library/debian@sha256:79becb70a6247d277b59c09ca340bbe0349af6aacb5afa90 3.5s
=> => resolve docker.io/library/debian@sha256:79becb70a6247d277b59c09ca340bbe0349af6aacb5afa90 0.0s
=> => sha256:79becb70a6247d277b59c09ca340bbe0349af6aacb5afa90ec349528b53ce2c9 1.85kB / 1.85kB 0.0s
=> => sha256:f46a268570dff2bdf3b243362802f0e60f511fc396f134952cb1458bd2b2f40c 529B / 529B 0.0s
=> => sha256:e3cbd207d8e55effc51a4738ed80bd81141d0f50c91bd83f9b18d404c129a8a1 1.46kB / 1.46kB 0.0s
=> => sha256:6a299ae9cfd996c1149a699d36cdaa76fa332c8e9d66d6678fa9a231d9ead04 49.58MB / 49.58MB 1.2s
=> => extracting sha256:6a299ae9cfd996c1149a699d36cdaa76fa332c8e9d66d6678fa9a231d9ead04c 2.2s
=> [internal] load build context 0.0s
=> => transferring context: 25B 0.0s
=> [2/5] RUN mkdir -p /home/baeldung/ 0.4s
=> [3/5] RUN apt-get update 2.2s
=> [4/5] RUN apt-get install -y vim 2.7s
=> [5/5] COPY file /home/baeldung/file 0.0s
=> exporting to image 0.3s
=> => exporting layers 0.3s
=> => writing image sha256:666d0107200d83915f9674066600f1b70d9ef61bcd7badde107b4901e351e3e5 0.0s
=> => naming to docker.io/library/repox:tax 0.0s
The build seems to have succeeded and we can see the steps. Notably, many of them spawn a new temporary container to be modified.
2.4. Basic Parent Comparison
At this point, we can check the images we have:
$ docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
repox tax 666d0107200d 1 minute ago 178MB
Normally, one of the main aims when creating base or parent images is to keep the footprint minimal, while ensuring full functionality.
So, let’s pull the parent image debian:latest separately:
$ docker pull debian:latest
latest: Pulling from library/debian
6adebae9cfd9: Already exists
Digest: sha256:79becb70a6247d277b59c09ca340bbe0349af6aacb5afa90ec349528b53ce2c9
Status: Downloaded newer image for debian:latest
docker.io/library/debian:latest
Next, we compare the sizes:
$ docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
repox tax 666d0107200d 2 minutes ago 178MB
debian latest e3cbd207d8e5 1 minute ago 117MB
As it turns out, our new image is around 60MB bigger than the original due to the repository information, the new file, and Vi installation.
2.5. Basic Inspection
By using docker inspect, we can get a general overview of an image via its identifier
$ docker inspect e3cbd207d8e5
[
{
"Id": "sha256:e3cbd207d8e55effc51a4738ed80bd81141d0f50c91bd83f9b18d404c129a8a1",
"RepoTags": [
"debian:latest"
],
"RepoDigests": [
"debian@sha256:79becb70a6247d277b59c09ca340bbe0349af6aacb5afa90ec349528b53ce2c9"
],
"Parent": "",
"Comment": "",
"Created": "2024-01-31T01:31:24.460285844Z",
"Container": "664651423fa834f57c458239fdae8b80dc09eda542e7e1362aef7a1fb50a2fec",
"ContainerConfig": {
[...]
},
"DockerVersion": "20.10.23",
"Author": "",
"Config": {
[...]
},
"Architecture": "amd64",
"Os": "linux",
"Size": 116551795,
"VirtualSize": 116551795,
"GraphDriver": {
"Data": {
"MergedDir": "/var/snap/docker/common/var-lib-docker/overlay2/86c2dfa7b060b535169f3bbef84a058ffa000213ced4cccf366ed50ef571be0c/merged",
"UpperDir": "/var/snap/docker/common/var-lib-docker/overlay2/86c2dfa7b060b535169f3bbef84a058ffa000213ced4cccf366ed50ef571be0c/diff",
"WorkDir": "/var/snap/docker/common/var-lib-docker/overlay2/86c2dfa7b060b535169f3bbef84a058ffa000213ced4cccf366ed50ef571be0c/work"
},
"Name": "overlay2"
},
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:1dae5147cd293b16e7b8c93f778dbf7ceff5c81c2b2704d3e5a98d331cdbe0ab"
]
},
"Metadata": {
"LastTagTime": "0001-01-01T00:00:00Z"
}
}
]
Along with the layers, we see size, metadata, paths, and more.
3. dive
Because of the specifics around Docker and OCI images, we might want to explore a given image more thoroughly and see what it contains. Although docker inspect can be helpful in this regard, there are lower-level tools for the purpose such as dive.
In terms of inspection capabilities, the latter is comparable to skopeo. However, dive provides a piece of information that other tools don’t: image optimization potential.
3.1. Install
When working on Debian-based distributions, we can get the dive DEB file for installation via dpkg:
$ DIVE_VERSION=$(curl --silent 'https://api.github.com/repos/wagoodman/dive/releases/latest' | perl -n0we 'print $1 if /"tag_name": "v(.*?)"/;')
$ curl --location --output dive.deb "https://github.com/wagoodman/dive/releases/latest/download/dive_${DIVE_VERSION}_linux_amd64.deb"
Here, we use curl to get the latest dive.deb after parsing the GitHub API releases page with a Perl one-liner for the exact version. We can run similar commands to get the latest RPM file.
For a universal installation method, we can also use snap:
$ snap install dive
Importantly, if we choose this method, we might need to use –classic for writing report files back to the filesystem.
Further, we could also clone the git repository and install it from the sources.
Finally, there’s a public dive container image, so we can just use docker itself:
$ alias dive='docker run --rm --tty --interactive --volume=/var/run/docker.sock:/var/run/docker.sock wagoodman/dive'
Effectively, this method of deployment creates a temporary container from the wagoodman/dive image and maps the Docker socket as a –volume from and to /var/run/docker.sock. Then, we can use the new alias to build or directly inspect any image.
3.2. Basic Usage and Navigation
First, let’s again list the current images we have:
$ docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
repox tax 666d0107200d 2 minutes ago 178MB
debian latest e3cbd207d8e5 1 minute ago 117MB
Next, we can fire up dive with the debian base image identifier:
$ dive e3cbd207d8e5
Image Source: docker://e3cbd207d8e5
Fetching image... (this can take a while for large images)
Analyzing image...
Building cache...
[...]
At this point, *we should see a basic tmux-like terminal user interface (TUI) with several panes*:
│ Current Layer Contents ├────────────────────────
┃ ● Layers ┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ├── bin → usr/bin
Cmp Size Command ├── boot
116 MB FROM e8391b19d63a54a ├── dev
├── etc
│ ├── .pwd.lock
│ ├── adduser.conf
│ ├── alternatives
[...]
│ Layer Details ├───────────────────────────────── │ │ ├── rmt.8.gz → /usr/share/man/man8/rmt-tar
│ │ ├── which → /usr/bin/which.debianutils
Tags: (unavailable) │ │ ├── which.1.gz → /usr/share/man/man1/which
Id: e8391b19d63a54a23f580bd888d66669d876dd261e │ │ ├── which.de1.gz → /usr/share/man/de/man1/
71f257008afb3c4d213d1d │ │ ├── which.es1.gz → /usr/share/man/es/man1/
Digest: sha256:1dae5147cd293b16e7b8c93f778dbf7ceff │ │ ├── which.fr1.gz → /usr/share/man/fr/man1/
5c81c2b2704d3e5a98d331cdbe0ab │ │ ├── which.it1.gz → /usr/share/man/it/man1/
Command: │ │ ├── which.ja1.gz → /usr/share/man/ja/man1/
#(nop) ADD file:6d9e71f0d3afb0b288cf2c06425795d528 │ │ ├── which.pl1.gz → /usr/share/man/pl/man1/
a142872692072ab1cd1ad275b67d1f in / │ │ └── which.sl1.gz → /usr/share/man/sl/man1/
│ ├── apt
│ │ ├── apt.conf.d
[...]
│ Image Details ├───────────────────────────────── │ │ │ ├── docker-gzip-indexes
│ │ │ └── docker-no-languages
Image name: e3cbd207d8e5 │ │ ├── auth.conf.d
Total Image size: 116 MB │ │ ├── keyrings
Potential wasted space: 0 B │ │ ├── preferences.d
Image efficiency score: 100 % │ │ ├── sources.list.d
│ │ │ └── debian.sources
Count Total Space Path │ │ └── trusted.gpg.d
│ │ ├── debian-archive-bookworm-automatic.
[...]
^C Quit | Tab Switch view | ^F Filter | ^L Show layer changes | ^A Show aggregated changes | x |
In this view, there’s data on the number and sources of Layers, Layer details for the current selection, and general Image Details.
We can use the arrow keys to navigate within a view (vertical split). To switch views, we hit Tab. The Space key expands and collapses directories in the Current Layer Contents view.
3.3. Non-interactive Output
If we have write access to the host filesystem, we can also leverage the –json data export feature:
$ dive --json debian_image_data.json e3cbd207d8e5
$ cat debian_image_data.json
{
"layer": [
{
"index": 0,
"id": "e8391b19d63a54a23f580bd888d66669d876dd261e71f257008afb3c4d213d1d",
"digestId": "sha256:1dae5147cd293b16e7b8c93f778dbf7ceff5c81c2b2704d3e5a98d331cdbe0ab",
"sizeBytes": 116542893,
"command": "#(nop) ADD file:6d9e71f0d3afb0b288cf2c06425795d528a142872692072ab1cd1ad275b67d1f in / "
}
],
"image": {
"sizeBytes": 116542893,
"inefficientBytes": 0,
"efficiencyScore": 1,
"fileReference": []
}
}
Although much more concise, we can still see some general information. Yet, this is much less than the docker inspect output.
Notably, we can also get a summary screen without interaction via the CI=true environment variable value:
$ CI=true dive e3cbd207d8e5
Using default CI config
Image Source: docker://e3cbd207d8e5
Fetching image... (this can take a while for large images)
Analyzing image...
efficiency: 100.0000 %
wastedBytes: 0 bytes (0 B)
userWastedPercent: NaN %
Inefficient Files:
Count Wasted Space File Path
None
Results:
PASS: highestUserWastedPercent
SKIP: highestWastedBytes: rule disabled
PASS: lowestEfficiency
Result:PASS [Total:3] [Passed:2] [Failed:0] [Warn:0] [Skipped:1]
In this case, we see a 100.0000 % efficiency, meaning no data within the image seems unnecessary. This is exactly as expected for a parent image.
3.4. Custom Image Analysis
So, let’s open our custom-built image and check the basic details in comparison to its base:
$ dive docker://666d0107200d
Looking at the Layers pane, we can immediately notice that, unlike its parent, the custom image has more than one layer:
┃ ● Layers ┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Cmp Size Command
116 MB FROM b1d04cf5ba033d6
0 B RUN /bin/sh -c mkdir -p /home/baeldung/ # buildkit
19 MB RUN /bin/sh -c apt-get update # buildkit
42 MB RUN /bin/sh -c apt-get install -y vim # buildkit
0 B COPY file /home/baeldung/file # buildkit
Again, each layer represents a filesystem change. This is why we don’t see the metadata manifest change from CMD [“vim”, “–version”].
Naturally, we also see a decrease in the Image efficiency score:
│ Image Details ├──────────────────────────────────────
Image name: 666d0107200d
Total Image size: 178 MB
Potential wasted space: 1.9 MB
Image efficiency score: 99 %
Count Total Space Path
2 1.5 MB /var/cache/debconf/templates.dat
2 158 kB /var/lib/dpkg/status-old
2 158 kB /var/lib/dpkg/status
2 11 kB /var/lib/apt/extended_states
2 9.2 kB /etc/ld.so.cache
2 9.0 kB /var/log/apt/eipp.log.xz
2 8.8 kB /var/cache/debconf/config.dat
As expected, most of the files we might not expect to see just contain leftover cache data from the package installation.
3.5. Key Shortcuts
To work with the interactive interface of dive, we can also use several hotkeys:
- exit: Ctrl+C, Q
- file filter: Ctrl+F
- aggregate layer view: Ctrl+A
- current layer view: Ctrl+L
- toggle showing [A]dded, [R]emoved, [M]odified, [U]nmodified files: Ctrl+
This way, we can navigate the data more conveniently.
3.6. Configuration
Finally, although we usually won’t need it, dive can also be configured via several files:
- $XDG_CONFIG_HOME/dive/*.y[a]ml
- $XDG_CONFIG_DIRS/dive/*.y[a]ml
- $HOME/.config/dive/*.y[a]ml
- $HOME/.dive.y[a]ml
Let’s see an annotated example configuration:
$ cat $HOME/.dive.yaml
# "docker" or "podman"
container-engine: docker
# analyze despite errors
ignore-errors: false
log:
enabled: true
path: ./dive.log
level: info
# key changes
keybinding:
# Global bindings
quit: ctrl+c
toggle-view: tab
filter-files: ctrl+f, ctrl+slash
# Layer view specific bindings
compare-all: ctrl+a
compare-layer: ctrl+l
# File view specific bindings
toggle-collapse-dir: space
toggle-collapse-all-dir: ctrl+space
toggle-added-files: ctrl+a
toggle-removed-files: ctrl+r
toggle-modified-files: ctrl+m
toggle-unmodified-files: ctrl+u
toggle-filetree-attributes: ctrl+b
page-up: pgup
page-down: pgdn
diff:
# change default files shown for diff
hide:
- added
- removed
- modified
- unmodified
filetree:
# default collaps state
collapse-dir: false
# proportion between vertical view widths
pane-width: 0.5
# file attribute toggling
show-attributes: true
layer:
# show aggregate layer changes by default
show-aggregated-changes: false
Thus, we can also change the keyboard shortcuts along with the interface behavior.
4. container-diff
When creating different container images, there are times when we might also want to compare them. For example, root filesystem, layer, and other changes sometimes play a role in deployments.
For this purpose, the container-diff tool can be invaluable.
4.1. Install
Before using container-diff, we install it via an official channel:
$ curl --location --remote-name 'https://storage.googleapis.com/container-diff/latest/container-diff-linux-amd64' && \
install container-diff-linux-amd64 /usr/local/bin/container-diff
If install isn’t available, we can often just copy or use the binary directly. In any case, we should have access to the main executable.
4.2. Differences
With container-diff, we can perform comparisons and analysis for many category [–type]s:
- history
- metadata
- layer
- [file]system
- size, sizelayer
- apt, aptlayer
- rpm, rpmlayer
- node
- pip
To perform a difference check, we use the diff subcommand and supply both images as local daemon repository:tag strings. Notably, we first specify the base image, so we get an idea of what was added or changed.
For example, let’s perform a basic filesystem comparison between the images we already have:
$ container-diff diff --type file daemon://debian:latest daemon://repox:tax
-----File-----
These entries have been added to debian:latest:
FILE
SIZE
/etc/alternatives/editor
18B
/etc/alternatives/editor.1.gz
[...]
These entries have been deleted from debian:latest: None
These entries have been changed between debian:latest and repox:tax:
FILE SIZE1 SIZE2
/var/lib/dpkg/status 75K 79.2K
/var/lib/dpkg/status-old 75K 79.2K
/var/lib/apt/extended_states 5K 5.3K
/etc/ld.so.cache 4.4K 4.5K
/var/log/apt/eipp.log.xz 4.3K 4.5K
/var/lib/dpkg/diversions 98B 268B
/var/lib/dpkg/diversions-old 29B 187B
The output can become quite long due to the repository updates and archive operations.
Of course, we can use the –type option multiple times and specify other categories as well.
4.3. Analysis
On the other hand, we can get a complete analysis for a single container image via the analyze subcommand:
$ container-diff analyze --type history daemon://repox:tax
-----History-----
Analysis for repox:tax:
-/bin/sh -c #(nop) ADD file:6d9e71f0d3afb0b288cf2c06425795d528a142872692072ab1cd1ad275b67d1f in /
-/bin/sh -c #(nop) CMD ["bash"]
-RUN /bin/sh -c mkdir -p /home/baeldung/ # buildkit
-RUN /bin/sh -c apt-get update # buildkit
-RUN /bin/sh -c apt-get install -y vim # buildkit
-COPY file /home/baeldung/file # buildkit
-CMD ["vim" "--version"]
In this case, we check the history of the repox:tax image to reconstruct the build instructions.
5. Summary
In this article, we talked about two major utilities for container image analysis and comparison.
In conclusion, working with containers inevitably requires knowledge of the image structure and ways to check and compare contents for analysis, optimization, and security.