Kubernetes Pod Policies — restartPolicy

Learn the mechanics behind Kubernetes Pod restart policies: Always, OnFailure, and Never.

As a container orchestration tool, Kubernetes (K8S) ensures your applications are always running. However, applications can fail due to many reasons.

During such failures, Kubernetes tries to self-heal the application by restarting it, thanks to Pod restart policies.

We can take the term self-heal with a pinch of salt, as K8S can only attempt to restart the application; it can’t solve the application issue/ error. But where it’s a transient error that a simple restart can fix, this feature is handy.

In this article, we will practically understand how the restart policy works and how its values behave in certain conditions.

restartPolicy featured image

Here’s a quick summary for you.


Table of Contents


Resource Repo

To follow hands-on exercises, clone this git repo and change the directory to restartPolicy.

> git clone https://github.com/decisivedevops/blog-resources.git

> cd blog-resources/k8s/restartPolicy

Always

restartPolicy: Always ensures that a container is restarted whenever it exits, regardless of the exit status.

Always is the default restart policy for the pods created by controllers like Deployment, ReplicaSet, StatefulSet, and DaemonSets.

Let’s test it practically.

  • Apply the deployment.yml
> kubectl apply -f deployment.yml

deployment.apps/busybox-deployment created
  • This simple busybox deployment starts a pod, runs a while loop, sleeps for 5 seconds, and exits with a non-zero exit code.
  • Get the pod status.
> kubectl get pod

NAME                                  READY   STATUS      RESTARTS      AGE
busybox-deployment-7c5cfcf9b8-kp5d6   0/1     Error   2 (27s ago)   40
  • It restarts right after the loop ends and exits.
  • Let’s check the pod restart policy value.
> kubectl get po -l app=busybox -o yaml | grep -i restart

Two points to note here.

  1. We have not explicitly set the restartPolicy spec on the deployment. Always is the default value.
  2. The container exited with a non-zero status code, so you might think about what will happen if the container actually does its job and exits successfully, i.e., with 0 exit code.

Let’s try this.

  • Update the while loop in deployment.yml to exit with the status code 0.
- 'while true; do echo "Hello Kubernetes!"; sleep 5; exit 0; done'
  • Re-apply the deployment and observe the pod restart count.
> kubectl apply -f deployment.yml

deployment.apps/busybox-deployment configured

> kubectl get po -l app=busybox

NAME                                  READY   STATUS      RESTARTS     AGE
busybox-deployment-7c5cfcf9b8-d5dzj   0/1     Completed   1 (7s ago)   14s

# you can see a differance in the STAUTS column.
# with non-zero exit code, it will be Error.
# with zero exit code, it will be Completed.
  • However, it’s still restarting.

Wait, our pod’s work was completed successfully and exited with a proper exit status, so why is it in a restart loop?

This restart behavior is inherent to the working of Deployments to ensure that a specified number of pods are always running.

But what if we want to run a pod, complete the task, and exit? That’s where the other values of restartPolicy are helpful.

But there’s a catch with those values.

OnFailure and Never

As names suggest,

  • restartPolicy: OnFailure ensures that a pod is restarted only when it fails OR exits with non-zero exit stats.
  • restartPolicy: Never does not restart the pod once it fails or exits.

Demo time.

  • Update the deployment.yml to include restartPolicy as OnFailure.
    spec:
      containers:
      - name: busybox
        image: busybox
        args:
        - /bin/sh
        - -c
        - 'while true; do echo "Hello Kubernetes!"; sleep 5; exit 0; done'
      restartPolicy: OnFailure
  • Re-apply the deployment.
> kubectl apply -f deployment.yml

The Deployment "busybox-deployment" is invalid: spec.template.spec.restartPolicy: Unsupported value: "OnFailure": supported values: "Always"
  • Typo maybe? It doesn’t look like it as it says supported values: “Always”. What happened to the other two then?
  • Turns out, values OnFailure and Never are only applicable when,
    – The pod is launched on its own (i.e., without controller like Deployment, ReplicaSet, StatefulSet, and DaemonSets), and 
    – When a pod is started by a Job (including CronJob).
  • If you want to dive into the conversation about whys and hows about this, there is a long-running discussion thread on Github.

To get our task done of starting the pod, getting it to complete the job, and exiting, and not restarting, we need a Job controller.

  • job.yml defines a simple Job that starts a pod, prints out a string, and exits. It has restartPolicy as OnFailure.
  • Apply job.yml.
> kubectl apply -f job.yml

job.batch/busybox-job created
  • Check the pod status.
> kubectl get pod --field-selector=status.phase=Succeeded

NAME                READY   STATUS      RESTARTS   AGE
busybox-job-hn69d   0/1     Completed   0          1m
  • Now, it’s not restarted after completion.

You can test the below configurations to get further idea about how restartPolicy works for Job.

  1. Update the job.yml to exit the pod with a non-zero status code.
       command: ["/bin/sh", "-c", "echo Hello Kubernetes! && exit 1"]


> kubectl delete -f job.yml
> kubectl apply -f job.yml

# ------
# pod should be in restart loop
> kubectl get pod

2. Update the restartPolicy to Never and check the pod status.


Conclusion

A few people pointed out on this thread that having values Never and OnFailure is helpful in cases where we need to debug a pod that is having issues only in a Kubernetes environment.

Also, one point ( this isn’t very clear) is the official K8S documentation does not clearly state the working of these different values yet.

Until we have some clarification on these values from the official K8S team, here are a few key takeaways.

  1. Restart Policies Define Pod Resilience: The Always restart policy, the default for Pods managed by deployments and other controllers ensures continuous availability by restarting containers regardless of their exit status. This policy is required for applications that must be available at all times.
  2. Context-Specific Policy Usage: While Always is suitable for ongoing applications, OnFailure and Never restart policies are specifically helpful for standalone Pods (a rare case) or those managed by Jobs and CronJobs, catering to batch jobs and tasks that should not restart once completed.
  3. Deployment Limitations and Job Flexibility: Deployments enforce the Always policy to maintain service availability, highlighting a limitation for use cases requiring no restarts. Jobs offer flexibility, allowing for OnFailure and Never policies to ensure tasks run to completion as intended without unnecessary restarts.

🙏 I am grateful for your time and attention all the way through!

If this guide sparked a new idea,
a question, or desire to collaborate,
I’d love to hear from you:
🔗 E-Mail
🔗 LinkedIn
🔗 Upwork

Till we meet again, keep making waves.🌊 🚀