理解 Kubernetes 中的 livenessProbe 和 readinessProbe

1. Overview

Kubernetes provides three different widely known mechanisms:

startupProbe
livenessProbe
readinessProbe

Each of these probes keeps track of the health and availability of containers within a pod.

In this tutorial, we’ll learn about the startupProbe, livenessProbe, and readinessProbe mechanisms by configuring them for different scenarios.

2. Basics

When a pod is starting up, Kubernetes sends a startupProbe to check when the pod is available. Further, the framework continuously monitors the health of containers inside the pods via the livenessProbe and the readinessProbe.

To begin with, let’s understand the function of each probe:

startupProbe ensures that the pod and its containers become available after the initial start
livenessProbe serves as a diagnostic check to confirm if the containers are alive and stable
readinessProbe ensures that the containers are healthy to serve incoming traffic

If the startupProbe fails, that means the pod doesn’t come up even briefly, so the probing process concludes. At this point, the container is often considered malconfigured.

Effectively, the startupProbe and livelinessProbe are usually similar, with the former being less strict. Thus, when the livenessProbe fails, Kubernetes only considers the pod unhealthy, and then attempts to restart it as a recovery measure.

On the other hand, if the readinessProbe fails, Kubernetes isolates the pod, preventing it from receiving more incoming traffic. Further, if a subsequent probe succeeds, Kubernetes identifies the pod as healthy and resumes incoming traffic flow.

Now that we habe a basic understanding of these mechanisms, we’ll learn more about them in the subsequent sections by exploring different configuration scenarios.

3. HTTP Get Probe

We can configure startupProbe, livenessProbe, and readinessProbe for a pod using an HTTP Get Probe. With this approach, Kubernetes performs a periodic HTTP GET request at a specific endpoint and port of the container to check its health:

readinessProbe:
  httpGet:
    path: <RP_endpoint>
    port: <RP_port_number>
livenessProbe:
  httpGet:
    path: <LP_endpoint>
    port: <LP_port_number>
startupProbe:
  httpGet:
    path: <SP_endpoint>
    port: <SP_port_number>

At the most basic level, we only need a path and port. As already mentioned, SP_endpoint and LP_endpoint can be equivalent.

Effectively, if Kubernetes receives a response with a 2xx HTTP status code for the probe request, it considers the pod healthy. However, if it gets any other status code in the response, such as 3xx, 4xx, or 5xx, it treats the pod as unhealthy or unable to start up.

Let’s understand this by configuring the probes for a pod running an nginx server in Kubernetes:

$ cat nginx-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
    - name: nginx-container
      image: nginx
      ports:
        - containerPort: 80
      readinessProbe:
        httpGet:
          path: /
          port: 80
        initialDelaySeconds: 5
        periodSeconds: 10
      livenessProbe:
        httpGet:
          path: /
          port: 80
        initialDelaySeconds: 10
        periodSeconds: 15
      startupProbe:
        httpGet:
          path: /
          port: 80
        failureThreshold: 30
        periodSeconds: 10

It’s only mandatory to specify the path and port properties. In general, several parameters dictate the behavior of probes:

initialDelaySeconds (default 0) helps add a delay to the first probe after the container starts
periodSeconds (default 10) decides the subsequent period for checking
timeoutSeconds (default 1) sets the number of seconds after which the probe is considered timed out
successThreshold (default 1) after a failure, this is the number of successful probes before the system infers overall success
failureThreshold is the number of failed probes in a row that indicate an overall failure
terminationGracePeriodSeconds (default 30) is the period to wait before a shutdown trigger and forced stop of a failed container

Now, let’s use the configuration from the nginx-pod.yaml file to create the nginx pod:

$ kubectl apply -f nginx-pod.yaml
pod/nginx-pod created

Finally, let’s list the pods and verify that the nginx pod is up and in a Running state:

$ kubectl get pods
NAME        READY   STATUS    RESTARTS   AGE
nginx       1/1     Running   0          3m43s

We can observe that the RESTARTS count is zero. Further, this field keeps track of the number of attempts made to restart and recover the pod if the livenessProbe fails.

4. Command Probe

Now, let’s see how to use shell commands to configure the startup, liveness, and readiness probes:

readinessProbe:
  exec:
    command:
      - "shell-cmd"
livenessProbe:
  exec:
    command: 
      - "shell-cmd"
startupProbe:
  exec:
    command:
      - "shell-cmd"

With a command probe, Kubernetes can execute a shell command to infer the health and availability of the container. If the exit status of the command is zero, then it treats the container as healthy, while a non-zero exit status indicates the pod is unhealthy and performs recovery actions such as restarts.

Let’s apply our understanding by configuring the probes for a pod that has an Ubuntu image running inside a container:

$ cat ubuntu-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test-ubuntu
  labels:
    app: ubuntu
spec:
  containers:
  - image: ubuntu
    command:
      - "sleep"
      - "604800"
    imagePullPolicy: IfNotPresent
    name: test-ubuntu
    livenessProbe:
      exec:
        command: 
          - "ps"
          - "-C"
          - "sleep"
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 1
      failureThreshold: 2
    readinessProbe:
      exec:
        command:
          - "ls"
          - "/bin"
          - "/boot"
          - "/usr"
      initialDelaySeconds: 10
      periodSeconds: 15
      successThreshold: 1
      failureThreshold: 2
    startupProbe:
      exec:
        command:
          - "ls"
          - "/boot"
        initialDelaySeconds: 30
        periodSeconds: 20
        successThreshold: 1
        failureThreshold: 100

Notably, we used the ps command in the livenessProbe to check whether the sleep command responsible for keeping the container alive is still running. Additionally, we employed the ls command in the readinessProbe and startupProbe to ensure critical directories are present.

Again, the difference between the livenessProbe and the startupProbe is mainly in the lower requirements of the latter:

expecting just the /boot directory to be present
higher initialDelaySeconds
higher periodSeconds
low successThreshold
very high failureThreshold

Outside the startupProbe, in case of a failure, we want at least two attempts before Kubernetes can mark the pod as unhealthy, so we’ve set failureThreshold value as 2. On the other hand, a single successful attempt is sufficient to declare the pod healthy, so we’ve set the successThreshold parameter as 1.

Now, let’s go ahead and create the pod using the configuration from the ubuntu-pod.yaml file:

$ kubectl apply -f ubuntu-pod.yaml 
pod/test-ubuntu created

Finally, let’s verify the status of the test-ubuntu pod along with the RESTARTS property:

$ kubectl get pod test-ubuntu
NAME          READY   STATUS    RESTARTS   AGE
test-ubuntu   1/1     Running   0          24s

It looks like all is correct, as the STATUS field shows as Running, and there are no restarts.

5. TCP Socket Probe

For a container running a TCP-based network service, we can also choose to set up a TCP socket probe for all checks:

readinessProbe:
  tcpSocket:
    port: <port_number>
livenessProbe:
  tcpSocket:
    port: <port_number>
startupProbe:
  tcpSocket:
    port: <port_number>

With this approach, Kubernetes performs the initial and periodical checks by establishing a TCP connection to the specified port. If the connection is successful, then it infers that the pod is healthy. Otherwise, it treats the pod as unhealthy and performs recovery actions such as restarts.

Now, let’s put this into action by configuring all probes for a pod that’s running a Redis server inside a container:

$ cat redis-server-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: redis-server
spec:
  containers:
  - name: redis
    image: redis:latest
    ports:
    - containerPort: 6379
    readinessProbe
      exec:
        command: 
          - "redis-cli"
          - "ping"
      initialDelaySeconds: 15
      periodSeconds: 20
      timeoutSeconds: 5
    livenessProbe:
      tcpSocket:
        port: 6379
      initialDelaySeconds: 15
      periodSeconds: 20
      timeoutSeconds: 5
    startupProbe:
      tcpSocket:
        port: 6379
      initialDelaySeconds: 30
      periodSeconds: 20
      timeoutSeconds: 10

Of course, the Redis server accepts connections at the 6379 port, so we used the same port while configuring the livenessProbe and the startupProbe with tcpSocket. In addition, we’re using the redis-cli command to ping the Redis server as a readiness check. Furthermore, we specify the timeoutSeconds property to define the maximum duration the probe can wait to establish the connection before considering it a failure.

Now, let’s use the configuration from the redis-server-pod.yaml file to create the redis-server pod:

$ kubectl apply -f redis-server-pod.yaml
pod/redis-server-pod created

Again, we can verify that the redis-server pod is running with 0 RESTARTS:

$ kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
redis-server   1/1     Running   0          2m56s

As we can see, this additional configuration type for setting up the probes for the pods running in Kubernetes also seems to work as expected.

It’s important to understand that all probes are supposed to be lightweight, so that there’s no severe impact on the container when executed periodically. Almost always, we should keep the minimal and critical check necessary to infer the pod’s health at the container level.

6. Conclusion

In this article, we learned about the startupProbe, the livenessProbe, and the readinessProbe in Kubernetes. Additionally, we explored different scenarios for configuring these probe via HTTP Get and TCP Socket.

Persistence

REST