Kyverno CVE-2023-34091: Bypassing policies using Kubernetes finalizers
In security research, it's crucial to adopt an attacker mindset and explore the unexpected. This blog post delves into an intriguing journey of stumbling across a security bug in Kyverno, a Kubernetes admission webhook server used for validating and mutating resources with customizable policies. The exploration began with a seemingly innocuous experiment to implement a finalizer using Kyverno, only to discover that it ignored resources with a deletionTimestamp
set. While this could have been the end of the story, the author's curiosity led to uncovering a potentially exploitable vulnerability. Join us as we explore the intricacies of this bug and its impact on policy enforcement during resource deletion, shedding light on the importance of security research in open-source software development and implementation.
How to stumble across security bugs
- Always think like an attacker
- Do silly things with software (this ends up exercising weird combinations of features)
- When it doesn't work (because you were doing something ridiculous), figure out if there's a bug
- With an attacker mindset, consider if the bug is exploitable
Specifically, I decided to try implementing a finalizer using Kyverno (a Kubernetes admission webhook server with customizable policies for validating and mutating resources) by writing a ClusterPolicy to match resources with a deletionTimestamp
. This is a silly thing to use Kyverno for but could have been useful for writing a proof of concept finalizer without writing any code. It turns out that this was impossible because Kyverno ignored resources with a deletionTimetamp
set.
Now, this could just be the end of my idea to implement a finalizer with Kyverno… but I asked, "what's the impact of not performing validation on these resources?" I tried to see if this is somehow exploitable… it looked like Kyverno would sometimes not enforce validation policies which sounded scary.
A quick refresher on how deletion works in Kubernetes
- A Kubernetes API client performs an HTTP
DELETE
on a resource. - If
.metadata.finalizers[]
in the resource is empty or doesn't exist, the Kubernetes API will remove the resource immediately.
— Webhooks will see oneDELETE
operation on the resource. - If
.metadata.finalizers[]
is not empty, instead of removing the resource, the Kubernetes API will update the resource with.metadata.deletionTimestamp
set to the current time.
— Webhooks will see the sameDELETE
operation on the resource as above,.metadata.deletionTimestamp
will not be set yet. This corresponds to the user's Kubernetes API request.
— If changes are made to the resource during deletion, Webhooks will seeUPDATE
operations on the resource with.metadata.deletionTimestamp
set.
— A controller implementing a finalizer will notice the added.metadata.deletionTimestamp
and begin whatever finalization steps it needs to do. When it finishes, it will update the resource to remove its finalizer key from.metadata.finalizers[]
.
— A well-behaved controller probably should ignore anything changed inside.spec
if.metadata.deletionTimestamp
is non-nil. - When
.metadata.finalizers[]
is eventually (updated to) an empty list, the Kubernetes API will actually remove the resource.
— Webhooks will see anUPDATE
operation on the resource previously deleted. The only indication in the AdmissionReview that thisUPDATE
is on a previously deleted resource and will result in removal is that.metadata.deletionTimestamp
is present and.metadata.finalizers[]
is being removed. Again, this corresponds to the Kubernetes API request (likely from the controller implementing the finalizer in this case).
If you want to observe this process yourself, set the --dumpPayload=true
flag on Kyverno's admission controller and grep the logs for a resource as you delete it.
It seemed like Kyverno's behavior of ignoring resources with deletionTimetamp
set was a bug and potentially exploitable, but I tried a few resource kinds, and at first, I was unable to exploit anything. Some resource kinds (e.g. Pods) are mostly immutable: an update isn't allowed to edit most of the fields in .spec
. Some resource kinds (e.g. Pods, but also Deployments, StatefulSets, DaemonSets, etc) ignore changes to .spec
during deletion (for example, upon deletion, a Pod begins termination of the container and ignores all further updates). Also, deletion usually happens very quickly, and there's no opportunity to update the resource during deletion.
So… how is this bug exploitable?
Indeed, Kyverno does not enforce any policies on UPDATE
operations on resources if .metadata.deletionTimestamp
is non-nil.
It's possible to add a bogus non-existent bburky.com/hax
finalizer to a resource's .metadata.finalizers[]
. Because nothing implements this finalizer, deletion of a resource may be delayed indefinitely.
- Perhaps it is possible to race an existing finalizer, but this trick makes the exploit very reliable.
Most, but not all, controllers ignore changes to the .spec
after deletion begins.
- Notably, Pods begin termination as soon as
.metadata.deletionTimestamp
is set. Even though.spec.containers[].image
is mutable,UPDATE
operations are ignored during deletion.
This does limit the impact of this vulnerability. It is very common to use Kyverno to implement policies on Pods. These policies are unaffected by this exploit. - Some resources don't really have a controller at all. ConfigMaps don't really have a controller of their own, instead other controllers of other resources like Pods use them. It turns out that you can actually mount a ConfigMap that's being deleted in a Pod's
.spec.volumes
and updates will be reflected in the filesystem. - CRDs have their own custom controllers, and they should check for
.metadata.deletionTimestamp
... but not all of them do. Kubebuilder's scaffold templates do correctly check it, but controllers that manually implementReconcile()
using controller-runtime might forget to implement this check. - It's possible that the built-in Kubernetes controllers have bugs or inconsistent behavior. After some research I discovered that during finalization, updates to LoadBalancer Services may be ignored and remain pending. However, NodePort Services updated during finalization will create a listening NodePort.
I decided this weird behavior of NodePorts best demonstrated the Kyverno bug's exploitability. Kyverno provides a "best practices" policy for Disallow NodePort; if I could find a way to create a NodePort with this policy in place, I would demonstrate that I found a vulnerability.
And… it actually worked:
- Create a Service with an allowed type like ClusterIP
- Maliciously add a non-existent finalizer to the Service and delete the Service.
- Update the Service to change its type to NodePort.
This is the vulnerability in Kyverno: it should still enforce validation policies, even during deletion. - Test access to the Service via the NodePort to demonstrate that Kubernetes NodePorts are applied even during deletion.
This is arguably a bug in Kubernetes but is not a security bug in Kubernetes. Kyverno is a generic policy tool and should not assume any particular behavior of resources.
Vulnerability Reporting
Following the process in Kyverno's SECURITY.md
I sent the following report and proof of concept exploit to kyverno-security@googlegroups.com:
validationFailureAction: Enforce policies may be bypassed by editing resources during finalization
https://gist.github.com/bburky/c137b39dd2ec48c9efd818af7507465e
The Kyverno team responded promptly and immediately started working on a fix to include in the then-upcoming v1.10.0 major update. The fix was initially done in a public GitHub PR (but without any notes about security impact) and included in an alpha release. I was notified of the fix via email and was able to test the patch before the update and security advisory was released. I reported that the fix was incomplete, and a second patch was added that fully resolved the issue. The Kyverno v1.10.0 release resolves this vulnerability and includes both patches. Kyverno converted my proof of concept exploit into a unit test, which I thought was great.
A security advisory was issued after the v1.10.0 release. The advisory rates the severity as "low," which I concur with for policies on Pods, for which there is no security impact. However, the impact may be much higher if you use policies on some built-in kinds (such as Services or ConfigMaps) or policies on Custom Resources if they apply updates during deletion.
This is a good response overall. All software has bugs, but this issue was handled well and resolved appropriately. The vulnerability was fixed within a reasonable timeframe, and I was always kept informed of the progress.
Reporting timeline (Me ➡️ Kyverno):
- 2023-04-07 ➡️ Reported security vulnerability to Kyverno (…I feel bad for sending security emails at 5 PM on a Friday, sorry)
- 2023-04-10 ⬅️ Confirmation of email receipt
- 2023-04-11 ⬅️ Confirmation of vulnerability and intent to fix in upcoming v1.10 release
- 2023-04-18 ⬅️ Notification of Kyverno release v1.10.0-alpha.2 with the patch: fix: applies policies to the UPDATEs when resource deletionTimestamp is set · PR #6878
- 2023-04-18 ➡️ Reported fix is incomplete: background checks and policy updates still skip resources with a non-nil
deletionTimestamp
- 2023-05-11 ⬅️ Notification of Kyverno release v1.10.0-beta.1 which includes a second patch to address remaining issues: fix: remove deletionTimestamp checks · PR #7039
- 2023-05-11 ➡️ Confirmed that all known issues are resolved in v1.10.0-beta.1
- 2023-05-30 Kyverno release v1.10.0
- 2023-06-01 GHSA-hq4m-4948-64cc Security Advisory: CVE-2023-34091 A resource with a deletionTimestamp may allow policy circumvention
Thank you to the Kyverno team, especially Chip Zoller, for your help getting this fixed, and congratulations on releasing the 1.10.0 update with lots of new features too.
This specific issue only applied to Kyverno, but it is possible for similar vulnerabilities to exist in other admission webhooks. I did find multiple cases of various controllers and resource kinds (including some built-in to Kubernetes) with surprising behavior during deletion; any project doing similar policy enforcement should be careful not to assume any particular behavior of resources during deletion.
At Defense Unicorns, we believe in creating and supporting open-source software. In addition to releasing open-source software ourselves, we try to contribute back to the open-source community. Open source contributions usually mean new features, bug reports, or fixes, but they can also include security reviews of important software.
Kyverno is used as a Kubernetes policy-checking tool in Platform One's Big Bang to validate software is correctly configured to meet DoD DevSecOps Reference Architecture requirements. Cluster administrators might also use Kyverno to allow developers to safely deploy mission applications by using policies to enforce restrictions what resources may be deployed. Security research like this is important to validate that our usage of tools like Kyverno correctly implements security controls as designed.