Debugging Running Pods on Kubernetes

Exploring Kubernetes’s debugging feature, `kubectl debug`, and introducing `kubectl superdebug` — an enhanced kubectl debug supporting volume mounts.

Jonathan Merlevede

Published in

datamindedbe

6 min readOct 25, 2023

Executing commands using kubectl exec

If you run software on Kubernetes, you will, at some point, want to debug some aspect of what you deploy. A simple approach to debugging that is natural to people used to working with VMs is to connect to a running pod and hack away:

kubectl exec -it podname -c containername -- bash

This often works and is very useful. However, there are at least two Kubernetes “best practices” limiting exec’s usefulness in the real world:

Not running as root. Containers run with as few privileges as possible and may even run with randomized UIDs.
Minimal images. Images are kept as small as possible, with binaries installed into a distroless image as an extreme.

When applying these best practices, connecting to your container using kubectl exec is either impossible or drops you into a barren wasteland-like environment unsuited for debugging.

kubectl exec does not allow you to specify a user flag or capabilities to start your process with, instead copying those settings from the target container’s main command. Some Kubernetes users think this should be changed.

Debug containers

The Kubernetes-native answer to debugging running containers is to use kubectl debug. The debug command spins up a new container into a running pod. This new container can run as a different user and from any image you choose. Because the debug container runs within the same pod as the container it targets (and therefore on the same node), the isolation between both containers does not need to be absolute. The debug container can share system resources with other containers running in the same pod.

Consider wanting to inspect the CPU usage of a PostgreSQL database running in the container postcont in the pod postpod. The pod does not run as root, and the Postgres image does not have tools like top or htop installed — in other words, the kubectl exec command is of little use. You can then run the following command:

kubectl debug -it \
--container=debug-container \
--image=alpine \
--target=postcont \
postpod

You will be logged in as root (this is the default for the Alpine image) and can easily install your favorite interactive process viewer htop (apt add htop). You share the same process namespace as the postcont container and can see and even kill all the processes running there! When you exit the process, the ephemeral container stops existing, too.

Note: Specifying --target is non-optional if you want your debug container to share the same process namespace as postcont, even if postcont is the only container running in postpod.
Note: You can disconnect from your ephemeral container / bash session without exiting (killing) it by pressing CTRL+P CTRL+D. You can then later reconnect to it using kubectl attach.
Note: kubectl debug offers more functionality than outlined here, such as the copying of pods with a modified startup command or starting a “node” pod with access to the node’s filesystem.

Under the hood

The kubectl debug command above works by creating something called an ephemeral container. These containers are supposed to run temporarily in an existing pod to support actions such as troubleshooting.

The difference between “normal” containers and ephemeral containers is slim. Nothing really prevents ephemeral containers from running for a long time. I think the reason for having ephemeral containers is best understood by looking at foundational architectural choices made by Kubernetes at its inception:

Pods should be disposable and replaceable, and, supporting this,
the Pod specification is immutable.

This made a lot of sense when Kubernetes was used primarily for deploying stateless workloads — when pods themselves could be considered ephemeral. It can be restrictive in this new world where Kubernetes is used for everything. The Pod spec remains immutable, but Kubernetes models ephemeral containers as a subresource of Pod. Unlike “normal” containers, ephemeral containers are not part of the Pod spec, even if they are part of the pod. This subtle distinction keeps everyone happy 🥳!

Ephemeral containers are still relatively new; they have been stable since Kubernetes v1.25 (August 2022), beta since v1.23 (December 2021) and alpha since v1.22 (August 2021).

Mounting volumes

The built-in command kubectl debug can be very useful. It allows you to add an ephemeral container to a running pod, optionally sharing its process namespace with that of a running container. However, if you were expecting to use kubectl debug to inspect or modify any part of the running container’s filesystem, you’re out of luck — the filesystem of the debug pod is disjoint from that of the container you connect it to.

Luckily, we can do better. The idea is simple:

Retrieve the specification of the running target container.
Patch an ephemeral container into the pod. Configure it to share the same process namespace as the target container and additionally to include the same volume mounts.

There is no kubectl command for creating ephemeral containers, so we need to craft a PATCH request to the K8s API to create it. The kubectl proxy command allows reaching the K8s API.

This process is not exactly user-friendly, so it makes sense to wrap the procedure into a script or kubectl plugin. You can find an example implementation of such a script over here:

JonMerlevede/kubectl-superdebug

Extension of kubectl debug attaching the volume spec of your target container to your ephemeral debug pod …

github.com

Note that this approach and script can easily be extended to also copy the environment variable specification from the target container.

If you save this script as kubectl-superdebug and make it available on your path, you can run it as kubectl superdebug from anywhere as follows:

kubectl superdebug \
--container=debug-container \
--image=alpine \
--target=postcont \
postpod

You may also want to extend this script to copy other aspects of the target container into your debug container, such as references to environment variables.

This completes the overview of Kubernetes-native approaches to debugging running containers and should cover most people’s needs. However, read on if you’re particularly interested or have special needs!

Non-Kubernetes native approaches

Kubernetes does not offer a way to connect to a running container as root (unless the main process is running as root) or to access a container’s root filesystem from another container. This does not mean that these things are impossible to do. Kubernetes is, after all, simply a container orchestrator sitting on top of a containerization engine. You can usually do whatever you want by removing layers of abstraction if you, for some reason, really have to. Just make sure that you have to…

If you use the Docker Engine and can access your engine directly from a node or through a privileged container running on a node, then you can run docker exec --user and execute a process as a user of your choice. Plugins such as kubectl ssh and kubectl exec-user implement this approach. Unfortunately, modern engines such as containerd and CRI-O no longer offer the --user flag functionality — which means that these plugins do not work on modern Kubernetes installations.

However, even these modern engines usually just interface with Linux namespaces. You can run commands in whatever “container” you want by entering the appropriate set of Linux namespaces. The tool kpexec implements this approach. It starts a privileged pod on the same node as the target container, then determines which (Linux) namespaces to target, executes commands in those (Linux) namespaces, and finally streams their output to your terminal. As an added bonus, it can overlay a set of tools useful for debugging on top of the target container’s filesystem.

Unlike kubectl exec, kpexec can run commands with a different uid/gid and even different capabilities as the container’s main process. It is compatible with containerd and cri-o. kpexec takes a somewhat heavyweight and brittle approach and may not be compatible with your cluster's security configuration. It can be worth considering if kubectl (super)debug fails to suit your needs.

Note that kpexec directly executes commands into namespaces using nsenter. It is compatible with the ubiquitous container runtime runc, but incompatible with runtimes such as Kata Containers.

In this post, we looked at two Kubernetes-native approaches to debugging running containers: kubectl exec and kubectl debug. We investigated how kubectl debug works, and presented kubectl superdebug, a variation of kubectl debug that starts an ephemeral container sharing the same volumes as the target container and the same process namespace. Lastly, we reviewed some non-Kubernetes native approaches to container debugging.