Gergely Risko
Gergely Risko

How do you rollback deployments in Kubernetes?

July 2024


How do you rollback deployments in Kubernetes?

TL;DR: Kubernetes has a built-in rollback mechanism.

There are several strategies when it comes to deploying apps into production.

In Kubernetes, rolling updates are the default strategy to update the running version of your app.

The rolling update cycles the previous Pods out and brings newer Pods in incrementally.

Let's have a look at an example:

  • You have a Service and a Deployment with three replicas on version 1.0.0. You change the image in your Deployment to version 1.1.0; here's what happens next.
    1/16

    You have a Service and a Deployment with three replicas on version 1.0.0. You change the image in your Deployment to version 1.1.0; here's what happens next.

  • In a rolling update, Kubernetes creates a Pod with a new version of the image.
    2/16

    In a rolling update, Kubernetes creates a Pod with a new version of the image.

  • Kubernetes waits for the readiness (and startup) probe.
    3/16

    Kubernetes waits for the readiness (and startup) probe.

  • When the pod is Ready, it is running and can receive traffic.  Kubernetes can remove a previous Pod.
    4/16

    When the pod is Ready, it is running and can receive traffic. Kubernetes can remove a previous Pod.

  • The previous Pod has been removed.
    5/16

    The previous Pod has been removed.

  • Kubernetes is ready to start again.
    6/16

    Kubernetes is ready to start again.

  • Another Pod with the new version of the image is created.
    7/16

    Another Pod with the new version of the image is created.

  • Kubernetes waits for readiness.
    8/16

    Kubernetes waits for readiness.

  • The Pod is running and can receive traffic.
    9/16

    The Pod is running and can receive traffic.

  • The previous Pod is removed.
    10/16

    The previous Pod is removed.

  • We start again, for the last time.
    11/16

    We start again, for the last time.

  • A Pod with the new image is created.
    12/16

    A Pod with the new image is created.

  • Kubernetes waits for readiness.
    13/16

    Kubernetes waits for readiness.

  • The Pod is running and can receive traffic.
    14/16

    The Pod is running and can receive traffic.

  • The previous Pod is removed.
    15/16

    The previous Pod is removed.

  • The migration from the previous to the current version is complete.
    16/16

    The migration from the previous to the current version is complete.

Zero-downtime deployment is convenient when you wish not to interrupt your live traffic.

You can deploy as many times as you want, and your user will not notice the difference.

Of course, you can only do this, if the API provided by the new version of the microservice is API compatible with the previous version. In our case, we just upgraded from version 1 to 1.1, so we are fine.

However, even if you use techniques such as Rolling updates, there's still a risk that your application will not work as expected, using the new version of the image.

Rolling back a change

When you introduce a change that breaks production, you should have a plan to roll back that change.

Kubernetes and kubectl offer a simple mechanism to roll back changes to resources such as Deployments, StatefulSets and DaemonSets.

But before talking about rollbacks, you should learn a few crucial details about deployments.

You learned how Deployments are responsible for gradually rolling out new versions of your Pods without causing any downtime.

You are also familiar with Kubernetes watching over the number of replicas in your deployment.

If you asked for 5 Pods but have only 4, Kubernetes creates one more.

If you asked for 4 Pods but have 5, Kubernetes deletes one of the running Pods.

Since the replicas is a field in the Deployment, you might be tempted to conclude that it is the Deployment's job to count the number of Pods and create or delete them.

This is not the case, interestingly.

Deployments delegate counting Pods to another component: the ReplicaSet

Every time you create a Deployment, the deployment creates a ReplicaSet and delegates creating (and deleting) the Pods.

  • Let's focus on a Deployment.
    1/5

    Let's focus on a Deployment.

  • You might be tempted to think that Deployments are in charge of creating Pods.
    2/5

    You might be tempted to think that Deployments are in charge of creating Pods.

  • The Deployment doesn't create Pods. Instead, it creates another object called ReplicaSet.
    3/5

    The Deployment doesn't create Pods. Instead, it creates another object called ReplicaSet.

  • The Deployment passes the spec (which includes the replicas) to the ReplicaSet.
    4/5

    The Deployment passes the spec (which includes the replicas) to the ReplicaSet.

  • The ReplicaSet is in charge of creating the Pods and watching over them.
    5/5

    The ReplicaSet is in charge of creating the Pods and watching over them.

But why isn't the Deployment creating the Pods?

Why does it have to delegate that task to someone else?

Let's consider the following scenario.

You have a Deployment with a container on version 1 and three replicas.

You change the spec for your template and upgrade your container from version 1 to version 2.

A ReplicaSet holds one type of a Pod

If there were no ReplicaSet, then during this upgrade, the Deployment would have to work with both version 1 and version 2 Pods.

In that design, the current state of the rolling update would not be explicitly represented, which would make debugging more difficult.

So, in Kubernetes there is a rule, one ReplicaSet can only have one type of a pod, so you can't have version 1 and version 2 of the Pods in the same ReplicaSet.

The Deployment knows that the two Pods can't coexist in the same ReplicaSet, so it creates a second ReplicaSet to hold version 2.

Then, gradually, it decreases the number of replicas in the old ReplicaSet and increases the count in the new one until the new ReplicaSet has all the Pods.

In other words, the sole responsibility of ReplicaSet is to count pods.

The Deployment orchestrates the rolling update by managing RepicaSets.

  • Deployments create ReplicaSets that create Pods.
    1/10

    Deployments create ReplicaSets that create Pods.

  • Can you have two different Pods in the same ReplicaSet?
    2/10

    Can you have two different Pods in the same ReplicaSet?

  • ReplicaSets can only contain a single type of Pod. You can't use two different Docker images. How can you deploy two versions of the app simultaneously?
    3/10

    ReplicaSets can only contain a single type of Pod. You can't use two different Docker images. How can you deploy two versions of the app simultaneously?

  • The Deployment knows that you can't have different Pods in the same ReplicaSet. So it creates another ReplicaSet.
    4/10

    The Deployment knows that you can't have different Pods in the same ReplicaSet. So it creates another ReplicaSet.

  • It increases the number of replicas of the current ReplicaSet to one.
    5/10

    It increases the number of replicas of the current ReplicaSet to one.

  • And then it decreases the replicas count in the previous ReplicaSet.
    6/10

    And then it decreases the replicas count in the previous ReplicaSet.

  • The same process of increasing and decreasing Pods continues until all Pods are created on the current ReplicaSet.
    7/10

    The same process of increasing and decreasing Pods continues until all Pods are created on the current ReplicaSet.

  • Please notice how you have two Pods templates and two ReplicaSets.
    8/10

    Please notice how you have two Pods templates and two ReplicaSets.

  • Also, the traffic is hitting both the current and previous version of the app.
    9/10

    Also, the traffic is hitting both the current and previous version of the app.

  • After the rolling update is completed, the previous ReplicaSet is not deleted.
    10/10

    After the rolling update is completed, the previous ReplicaSet is not deleted.

But what if you don't care about rolling updates and only wish for your Pods to be recreated when deleted?

Could you create a ReplicaSet without a Deployment?

Of course, you can.

Here's an example of a ReplicaSet.

replicaset.yaml

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: example-replicaset
spec:
  replicas: 3
  selector:
    matchLabels:
      name: app
  template:
    metadata:
      labels:
        name: app
    spec:
      containers:
      - name: app
        image: learnk8s/app:1.0.0

For reference, this is a Deployment that creates the ReplicaSet above:

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      name: app
  template:
    metadata:
      labels:
        name: app
    spec:
      containers:
      - name: app
        image: learnk8s/app:1.0.0

Aren't those the same?

They are in this example.

However, in a Deployment, you can define properties such as how many Pods to create and destroy during a rolling update (the field is strategy).

The exact property isn't available in the ReplicaSet.

How do you know which properties are available? You can consult the official API.

In general, the YAML for the Deployment contains the ReplicaSet plus some additional details.

You can create the ReplicaSet with:

bash

kubectl create -f replicaset.yaml

Please remember, that in practice, we don't create ReplicaSets by hand. The go to default object time for stateless workloads, is the Deployment.

If you were to use a ReplicaSet directly, you would lose the ability to do rolling update, and we always have to be ready to upgrade apps.

There's something else worth noting about the ReplicaSets and Deployments.

When you upgrade your Pods from version 1 to version 2, the Deployment creates a new ReplicaSet and increases the number of replicas while the previous count goes to zero.

After the rolling update, the previous ReplicaSet is not deleted — not immediately, at least.

Instead, it is kept around with a replica count of 0.

If you try to execute another rolling update from version 2 to version 3, you might notice that you have two ReplicaSets with a count of 0 at the end of the upgrade.

Why were the previous ReplicaSets not deleted or the garbage collected?

Imagine that the current version of the container introduces a regression.

You probably don't want to serve unhealthy responses to your users, so you should roll back to a previous version of your app.

If you still have an old ReplicaSet, you could scale the current replicas to zero and increment the previous ReplicaSet count.

In other words, keeping the previous ReplicaSets around is a convenient mechanism to roll back to a previously working version of your app.

By default, Kubernetes stores the last 10 ReplicaSets and lets you roll back to any of them.

However, you can change how many ReplicaSets should be retained by changing the spec.revisionHistoryLimit in your Deployment.

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment
spec:
  replicas: 3
  revisionHistoryLimit: 0
  selector:
    matchLabels:
      name: app
  template:
    metadata:
      labels:
        name: app
    spec:
      containers:
      - name: app
        image: learnk8s/app:1.0.0

In this example, we changed this value to zero, which means that no previous ReplicaSet will be kept around, this is common practice with GitOps.

What about the previous ReplicaSets?

Could you list all the previous Replicasets that belong to a Deployment?

You can use the following command to inspect the history of your Deployment:

bash

kubectl rollout history deployment/app

And you can rollback to a specific version with:

bash

kubectl rollout undo deployment/app --to-revision=2

But how does the Deployment know their ReplicaSets?

Does it store the order in which ReplicaSets are created?

The ReplicaSets have random names with IDs such as app-6ff88c4474, so you should expect the Deployment to store a reference.

Let's inspect the Deployment with:

bash

kubectl get deployment app -o yaml

Nothing looks like a list of the previous 10 ReplicaSets.

Deployments don't hold a reference to their ReplicaSets.

Instead ReplicaSets hold a back reference to their Deployment:

bash

kubectl get replicaset app-6ff88c4474 -o yaml

You can find the back reference under ownerReferences.

Also, the ID that we previously called random, is actually a hash of the template section in the YAML.

This makes it so in Kubernetes, that if you apply the same YAML that you already had before, a previous ReplicaSet (Deployment revision) will be reused.

What about the order?

How do you know which one was the last ReplicaSet used? Or the third?

Kubernetes stores the revision in the ReplicaSet.metatada.annotation.

You can inspect the revision with the following command:

bash

kubectl get replicaset app-6ff88c4474 -o yaml

In the case below, the revision is 3:

replicaset.yaml

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: example-replicaset
  annotations:
    deployment.kubernetes.io/revision: "3"
spec:
  replicas: 3
  selector:
    matchLabels:
      name: app
  template:
    metadata:
      labels:
        name: app
    spec:
      containers:
      - name: app
        image: learnk8s/app:1.0.0

So, what happens when you find a regression in the current release and decide to rollback to version 2 like so:

bash

kubectl rollout undo deployment/app --to-revision=2

If before the undo, you had three ReplicaSets with revisions 1, 2, and 3, now you should have 1, 3, and 4.

There's a missing entry in the history: revision 2 promoted to 4.

There's also something else that looks useful but doesn't work quite right.

The history command displays two columns: Revision and Change-Cause.

bash

kubectl rollout history deployment/app
REVISION  CHANGE-CAUSE
1         <none>
2         <none>
3         <none>

While you're now familiar with the Revision column, you might be wondering what Change-Cause is used for — and why it's always set to <none>.

When you create a resource in Kubernetes, you can append the --record flag like so:

bash

kubectl create -f deployment.yaml --record

When you do, Kubernetes adds an annotation to the resource with the command that generated it.

In the example above, the Deployment has the following metadata section:

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment
  kubernetes.io/change-cause: kubectl create --filename=deployment.yaml
spec:
  replicas: 3
  selector:
    matchLabels:
      name: app
  template:
    metadata:
      labels:
        name: app
    spec:
      containers:
      - name: app
        image: learnk8s/app:1.0.0

Now, if you try to display the history again, you might notice that the same annotation is used in the rollout history command:

bash

kubectl rollout history app
REVISION  CHANGE-CAUSE
1         kubectl create --filename=deployment.yaml --record=true

If you change the container image in the YAML file and apply the new configuration with:

bash

kubectl apply -f deployment.yaml --record

You should see the following new entry in the rollout history:

bash

kubectl rollout history deployment/app
REVISION  CHANGE-CAUSE
1         kubectl create --filename=deployment.yaml --record=true
2         kubectl apply --filename=deployment.yaml --record=true

The --record command can be used with any resource type, but the value is only used in Deployment, DaemonSet, and StatefulSet resources, i.e. resources that can be "rolled out" (see kubectl rollout -h).

But you should remember:

Also, there is an ongoing discussion on deprecating the --record flag.

The feature provides little value for manual usage.

However, it still has some justification for automated processes as a simple form of auditing (keeping track of which commands caused which changes to a rollout).

Should you use kubectl rollout undo to fix a regression in production?

Just because Kubernetes offers an option to roll back deployments doesn't mean it is a good idea.

Let's consider the following scenario:

If a colleague comes along, is it safe for them to deploy all the changes stored in version control?

Probably not.

The state of the cluster and the resources stored in version control drifted.

A better alternative is to "roll forward" by amending the deployment.yaml (e.g. with a revert in your version control) and triggering a new deployment.

If your automated deployment process takes too long in an emergency, you can still kubectl apply -f the reverted YAML file into your cluster by hand.

You should consider these kubectl rollout features deprecated and shouldn't depend on them.

Our recommendation is also to set the revisionHistoryLimit to zero, as discussed before, to ensure, that this drift between the version control and the production can't happen with usage of kubectl rollout commands.

Your source for Kubernetes news

You are in!