1. Introduction
Kubernetes is a distributed container orchestration framework. To utilize that, we can write various definitions for resources such as pods and storage. After that, we apply them to a cluster of nodes, at which point Kubernetes decides where to run and get storage for what workload. However, it’s sometimes hard to see the final distribution, especially of storage, for a given node.
In this tutorial, we explore steps to see Kubernetes storage usage for a particular cluster node. First, we briefly refresh our knowledge about Kubernetes deployments and their storage needs. After that, we perform a general storage usage check on the node of interest. Next, we check container images and their sizes. Then, we turn to the container runtime and its storage needs. Finally, we deal with pod storage usage discovery.
We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.2.15. Unless otherwise specified, it should work in most POSIX-compliant environments.
2. Kubernetes Storage
When it comes to storage used by Kubernetes, we should consider a number of sources:
- platform installation
- configuration
- images
- persistent volume (PV) and persistent volume claim (PVC) resources
While storage within Kubernetes is mainly distributed as PV or PVC resources, we can see that other parts of the whole framework can also take up space.
Let’s check the current Kubernetes cluster to have an overview of its state and elements:
$ kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default pod/pod0 1/1 Running 1 1h
kube-system pod/coredns-5dd5756b68-69h86 1/1 Running 0 24h
kube-system pod/etcd-xost 1/1 Running 1 24h
kube-system pod/kube-apiserver-xost 1/1 Running 0 24h
kube-system pod/kube-controller-manager-xost 1/1 Running 0 24h
kube-system pod/kube-proxy-55ht8 1/1 Running 0 24h
kube-system pod/kube-scheduler-xost 1/1 Running 6 24h
kube-system pod/storage-provisioner 1/1 Running 0 24h
kubernetes-dashboard pod/dashboard-metrics-scraper-7fd5cb4ddc-7sqqq 1/1 Running 0 24h
kubernetes-dashboard pod/kubernetes-dashboard-8694d4445c-hnt9w 1/1 Running 0 24h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 24h
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 24h
kubernetes-dashboard service/dashboard-metrics-scraper ClusterIP 10.106.175.119 <none> 8000/TCP 24h
kubernetes-dashboard service/kubernetes-dashboard ClusterIP 10.105.26.231 <none> 80/TCP 24h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 kubernetes.io/os=linux 24h
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 1/1 1 1 24h
kubernetes-dashboard deployment.apps/dashboard-metrics-scraper 1/1 1 1 24h
kubernetes-dashboard deployment.apps/kubernetes-dashboard 1/1 1 1 24h
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-5dd5756b68 1 1 1 24h
kubernetes-dashboard replicaset.apps/dashboard-metrics-scraper-7fd5cb4ddc 1 1 1 24h
kubernetes-dashboard replicaset.apps/kubernetes-dashboard-8694d4445c 1 1 1 24h
Apart from the dashboard and pod0, this is a fairly empty deployment.
On a lower level, the container runtime might use other means such as an overlay filesystem to expose storage:
$ df -h
Filesystem Size Used Avail Use% Mounted on
[...]
overlay 49G 25G 22G 54% /var/lib/docker/overlay2/349a603c4759f3f92c0e506f7668ebd06661d8a3b56fcbd1f06b071f49c81883/merged
overlay 49G 25G 22G 54% /var/lib/docker/overlay2/a4fdbc126518168bad7f2977f16eb666f7f33f47b67d9d75d2f74b96c3474eb9/merged
shm 64M 0 64M 0% /var/lib/docker/containers/55b6adb8afcd66659f32d1899993c2311f7d110fb430ce4ff8b9714f81c536a4/mounts/shm
shm 64M 0 64M 0%
[...]
overlay 49G 25G 22G 54% /var/lib/docker/overlay2/733c46aa655aee3b736cafe66629e32e0d9f4156671608243117b95a32c716a9/merged
Because of such abstractions, it can sometimes be hard to establish the exact storage resources that are currently available on a node and how much of them are in use.
So, let’s take a step-by-step methodical approach to discover what part of the Kubernetes ecosystem takes up the most storage.
3. General Storage Usage Check
As with any other reason for storage allocation issues, we begin by checking the overall usage.
To do this, we first install ncdu:
$ apt install ncdu
After that, we run it for the / filesystem root on the Kubernetes node of interest:
$ ncdu /
Thus, the tool performs a scan. This may take considerable time, depending on the storage medium speed, size, and current load.
After that, we get results in the form of a navigable list, sorted by size in descending order:
ncdu 1.18 ~ Use the arrow keys to navigate, press ? for help
--- / ---------------------------------------------------------------
. 66.6 GiB [#############################] /var
6.0 GiB [###### ] /usr
1.6 GiB [# ] /root
1.0 GiB [ ] /home
385.1 MiB [ ] /opt
98.5 MiB [ ] /boot
15.3 MiB [ ] /etc
5.6 MiB [ ] /run
2.1 MiB [ ] /mnt
1.0 MiB [ ] /dev
57.0 KiB [ ] /tmp
e 16.0 KiB [ ] /lost+found
8.0 KiB [ ] /media
8.0 KiB [ ] /srv
8.0 KiB [ ] /nfs
e 4.0 KiB [ ] /test
. 0.0 B [ ] /proc
0.0 B [ ] /sys
Total disk usage: 78.7 GiB Apparent size: 128.1 TiB Items: 666016
Here, we can use several keys to navigate:
- Up Arrow and Down Arrow: move focus respectively
- Right Arrow or Return: enter directory in focus
- Left Arrow or Backspace: go to upper directory
For instance, if we go to /var/lib/, we can see the minikube installation directory, which takes up around 300MB mainly due to its binaries.
Since dockerd is the container runtime for this particular Kubernetes deployment, we can also check /var/lib/docker/:
ncdu 1.18 ~ Use the arrow keys to navigate, press ? for help
--- /var/lib/docker -------------------------------------------------
/..
6.6 GiB [#############################] /overlay2
666.0 MiB [### ] /volumes
7.0 MiB [ ] /image
3.3 MiB [ ] /containers
1.6 MiB [ ] /buildkit
96.0 KiB [ ] /network
16.0 KiB [ ] /plugins
8.0 KiB [ ] /tmp
e 4.0 KiB [ ] /trust
e 4.0 KiB [ ] /swarm
e 4.0 KiB [ ] /runtimes
4.0 KiB [ ] nuke-graph-directory.sh
4.0 KiB [ ] engine-id
Total disk usage: 7.2 GiB Apparent size: 4.4 GiB Items: 66609
As expected, we can see the overlay2 and volumes directories are taking up the most space, since that’s how Docker organizes storage. Yet, we can’t be sure which exact containers are the main culprits by just looking at the raw filesystem.
For example, Kubernetes might represent only a part of the Docker usage.
4. Container Runtime Images
Kubernetes orchestrates containers. Images are one of the relatively hidden storage costs of containers. Although often stripped and minimalistic, they still take up space.
Let’s see how to get container image sizes with Docker:
$ docker image list --all
REPOSITORY TAG IMAGE ID CREATED SIZE
minapi-minapi latest 9b8f05946672 6 days ago 88.2MB
debian latest c9786667d5fe 3 weeks ago 117MB
debian bullseye 52d643040b9a 4 weeks ago 124MB
python latest ae66048b7429 8 weeks ago 1.02GB
[...]
Even though they are base images, we can already see the SIZE column can amount to gigabytes.
Notably, Kubernetes images usually have the k8s or kubernetes string in their name:
registry.k8s.io/kube-apiserver v1.28.3 537434729123 5 months ago 126MB
registry.k8s.io/kube-controller-manager v1.28.3 10baa1ca1706 5 months ago 122MB
registry.k8s.io/kube-scheduler v1.28.3 6d1b4fd1b182 5 months ago 60.1MB
registry.k8s.io/kube-proxy v1.28.3 bfc896cf80fb 5 months ago 73.1MB
registry.k8s.io/etcd 3.5.9-0 73deb9a3f702 10 months ago 294MB
registry.k8s.io/coredns/coredns v1.10.1 ead0a4a53df8 13 months ago 53.6MB
registry.k8s.io/pause 3.9 e6f181688397 17 months ago 744kB
kubernetesui/dashboard <none> 07655ddf2eeb 18 months ago 246MB
kubernetesui/metrics-scraper <none> 115053965e86 22 months ago 43.8MB
gcr.io/k8s-minikube/storage-provisioner v5 6e38f40d628d 2 years ago 31.5MB
This way, we can distinguish most pod-related containers from those isolated in Docker.
5. Container Runtime Storage Usage
Since containers usually make up a big part of the storage usage for a Kubernetes deployment, we list the mapping between containers and directories.
When it comes to Docker, we can use one compound command to map containers to directories:
$ docker inspect --format=$'{{.Name}}\n >>> {{.GraphDriver.Data.MergedDir}}\n' $(docker ps --all --quiet)
Let’s break this command down:
- inspect shows selected (container name and filesystem path) and [–format]ted data about a container
- ps lists –all containers [–quiet]ly (only container identifiers)
- $() is a command substitution that gets interpolated
For instance, there are already some familiar mappings from the df overlay listing we saw earlier:
[...]
/k8s_POD_kube-apiserver-xost_kube-system_b11cd851d3b912861b5862cb512d0521_0
>>> /var/lib/docker/overlay2/a4fdbc126518168bad7f2977f16eb666f7f33f47b67d9d75d2f74b96c3474eb9/merged
/k8s_kubernetes-dashboard_kubernetes-dashboard-8694d4445c-hnt9w_kubernetes-dashboard_cb1dc601-ecfe-42d3-b590-ca79877ae036_0
>>> /var/lib/docker/overlay2/733c46aa655aee3b736cafe66629e32e0d9f4156671608243117b95a32c716a9/merged
[...]
Thus, we can focus on specific containers, especially those with the k8s_ prefix.
6. Kubernetes Pods Storage Usage
Going higher up the chain, we can discover which Kubernetes pod is associated with a given (large) directory:
$ kubectl get pods --all-namespaces --output=jsonpath='
{range .items[*]}{@.metadata.name}
{" >>> volumes: "}{@.spec.volumes}
{" >>> volumeMounts: "}{@..volumeMounts}
{"\n"}{end}'
This kubectl command uses the get subcommand to extract the name and volumes data for all pods in –all-namespaces.
To do so, it uses a special jsonpath that goes through several steps:
- {range .items[*]}: iterate through all items (pods)
- {@.metadata.name}: item name
- {“\n >>> volumes: “}: visual formatting
- {@.spec.volumes}: the item volumes [spec]ifications
- {“\n >>> volumeMounts: “}: visual formatting
- {@..volumeMounts}: the item volumeMounts
- {“\n\n””}: visual formatting
- {end}: terminate jsonpath
Thus, we acquire all volume information, including directories.
To get further information about a given PV or PVC, we can use the describe subcommand and the respective resource NAME:
$ kubectl describe [pv|pvc] <NAME> --all-namespaces
Of course, we can also get a more script-friendly version via get:
$ kubectl get [pv|pvc] <NAME> --all-namespaces --output=json
This way, the Storage Class and Access Mode can give us an idea of any quotas or storage limits.
7. Summary
In this article, we talked about Kubernetes node storage allocation and discovery.
In conclusion, the storage usage around a Kubernetes deployment can vary significantly, so knowing how to analyze and limit it can be critical.