Emanuel Evans
Emanuel Evans

Extending applications on Kubernetes with multi-container pods

February 2021


Extending applications on Kubernetes with multi-container pods

TL;DR: In this article you will learn how you can use the ambassador, adapter, sidecar and init containers to extend yours apps in Kubernetes without changing their code.

Kubernetes offers an immense amount of flexibility and the ability to run a wide variety of applications.

If your applications are cloud-native microservices or 12-factor apps, chances are that running them in Kubernetes will be relatively straightforward.

But what about running applications that weren't explicitly designed to be run in a containerized environment?

Kubernetes can handle these as well, although it may be a bit more work to set up.

One of the most powerful tools that Kubernetes offers to help is the multi-container pod (although multi-container pods are also useful for cloud-native apps in a variety of cases, as you'll see).

Why would you want to run multiple containers in a pod?

Multi-container pods allow you to change the behaviour of an application without changing its code.

This can be useful in all sorts of situations, but it's convenient for applications that weren't originally designed to be run in containers.

Let's start with an example.

Securing an HTTP service

Elasticsearch was designed before containers became popular (although it's pretty straightforward to run in Kubernetes nowadays) and can be seen as a stand-in for, say, a legacy Java application designed to run in a virtual machine.

Let's use Elasticsearch as an example application that you'd like to enhance using multi-container pods.

The following is a very basic (not at all production-ready) Elasticsearch Deployment and Service:

es-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: elasticsearch
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: elasticsearch
  template:
    metadata:
      labels:
        app.kubernetes.io/name: elasticsearch
    spec:
      containers:
        - name: elasticsearch
          image: elasticsearch:7.9.3
          env:
            - name: discovery.type
              value: single-node
          ports:
            - name: http
              containerPort: 9200
---
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
spec:
  selector:
    app.kubernetes.io/name: elasticsearch
  ports:
    - port: 9200
      targetPort: 9200

The discovery.type environment variable is necessary to get it running with a single replica.

Elasticsearch will listen on port 9200 over HTTP by default.

You can confirm that the pod works by running another pod in the cluster and curling to the elasticsearch service:

bash

kubectl run -it --rm --image=curlimages/curl curl \
  -- curl http://elasticsearch:9200
{
  "name" : "elasticsearch-77d857c8cf-mk2dv",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "z98oL-w-SLKJBhh5KVG4kg",
  "version" : {
    "number" : "7.9.3",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "c4138e51121ef06a6404866cddc601906fe5c868",
    "build_date" : "2020-10-16T10:36:16.141335Z",
    "build_snapshot" : false,
    "lucene_version" : "8.6.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Now let's say that you're moving towards a zero-trust security model and you'd like to encrypt all traffic on the network.

How would you go about this if the application doesn't have native TLS support?

Recent versions of Elasticsearch support TLS, but it was a paid extra feature for a long time.

Our first thought might be to do TLS termination with an nginx ingress, since the ingress is the component routing the external traffic in the cluster.

But that won't meet the requirements, since traffic between the ingress pod and the Elasticsearch pod could go over the network unencrypted.

  • The external traffic is routed to the Ingress and then to Pods.
    1/2

    The external traffic is routed to the Ingress and then to Pods.

  • If you terminate TLS at the ingress, the rest of the traffic is unencrypted.
    2/2

    If you terminate TLS at the ingress, the rest of the traffic is unencrypted.

A solution that will meet the requirements is to tack an nginx proxy container onto the pod that will listen over TLS.

The traffic flows encrypted all the the way from the user to the Pod.

  • If you include a proxy container in the pod, you can terminate TLS in the Nginx pod.
    1/2

    If you include a proxy container in the pod, you can terminate TLS in the Nginx pod.

  • When you compare the current setup, you can notice that the traffic is encrypted all the way until the Elasticsearch container.
    2/2

    When you compare the current setup, you can notice that the traffic is encrypted all the way until the Elasticsearch container.

Here's what the deployment might look like:

es-secure-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: elasticsearch
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: elasticsearch
  template:
    metadata:
      labels:
        app.kubernetes.io/name: elasticsearch
    spec:
      containers:
        - name: elasticsearch
          image: elasticsearch:7.9.3
          env:
            - name: discovery.type
              value: single-node
            - name: network.host
              value: 127.0.0.1
            - name: http.port
              value: '9201'
        - name: nginx-proxy
          image: public.ecr.aws/nginx/nginx:1.23
          volumeMounts:
            - name: nginx-config
              mountPath: /etc/nginx/conf.d
              readOnly: true
            - name: certs
              mountPath: /certs
              readOnly: true
          ports:
            - name: https
              containerPort: 9200
      volumes:
        - name: nginx-config
          configMap:
            name: elasticsearch-nginx
        - name: certs
          secret:
            secretName: elasticsearch-tls
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: elasticsearch-nginx
data:
  elasticsearch.conf: |
    server {
        listen 9200 ssl;
        server_name elasticsearch;
        ssl_certificate /certs/tls.crt;
        ssl_certificate_key /certs/tls.key;

        location / {
            proxy_pass http://localhost:9201;
        }
    }

Let's unpack that a little bit:

So requests from outside the pod will go to Nginx on port 9200 over HTTPS and then forwarded to Elasticsearch on port 9201.

The request is proxies by Nginx on port 9220 and forwarded to port 9201 on Elastisearch

You can confirm it's working by making an HTTPS request from within the cluster.

bash

kubectl run -it --rm --image=curlimages/curl curl \
  -- curl -k https://elasticsearch:9200
{
  "name" : "elasticsearch-5469857795-nddbn",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "XPW9Z8XGTxa7snoUYzeqgg",
  "version" : {
    "number" : "7.9.3",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "c4138e51121ef06a6404866cddc601906fe5c868",
    "build_date" : "2020-10-16T10:36:16.141335Z",
    "build_snapshot" : false,
    "lucene_version" : "8.6.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

The -k version is necessary for self-signed TLS certificates. In a production environment, you'd want to use a trusted certificate.

A quick look at the logs shows that the request went through the Nginx proxy:

bash

kubectl logs elasticsearch-5469857795-nddbn nginx-proxy | grep curl
10.88.4.127 - - [26/Nov/2020:02:37:07 +0000] "GET / HTTP/1.1" 200 559 "-" "curl/7.73.0-DEV" "-"

You can also check that you're unable to connect to Elasticsearch over unencrypted connections:

bash

kubectl run -it --rm --image=curlimages/curl curl \
  -- curl http://elasticsearch:9200
<html>
<head><title>400 The plain HTTP request was sent to HTTPS port</title></head>
<body>
<center><h1>400 Bad Request</h1></center>
<center>The plain HTTP request was sent to HTTPS port</center>
<hr><center>nginx/1.19.5</center>
</body>
</html>

You've enforced TLS without having to touch the Elasticsearch code or the container image!

Proxy containers are a common pattern

The practice of adding a proxy container to a pod is common enough that it has a name: the Ambassador Pattern.

All of the patterns in this post are described in detail in a excellent paper from Google.

Adding basic TLS support is only the beginning.

Here are a few other things you can do with the Ambassador Pattern:

How do multi-container pods work?

Let's take a step back and tease apart the difference between pods and containers on Kubernetes to get a better picture of what's happening under the hood.

A "traditional" container (e.g. one started by docker run) provides several forms of isolation:

There are a few other things that Docker sets up, but those are the most significant.

The tools that are used under the hood are Linux namespaces and control groups (cgroups).

Control groups are a convenient way to limit resources such as CPU or memory that a particular process can use.

As an example, you could say that your process should use only 2GB of memory and one of your four CPU cores.

Namespaces, on the other hand, are in charge of isolating the process and limiting what it can see.

As an example, the process can only see the network packets that are directly related to it.

It won't be able to see all of the network packets flowing through the network adapter.

Or you could isolate the filesystem and let the process believe that it has access to all of it.

  • Since kernel version 5.6, there are eight kinds of namespaces and the mount namespace is one of them.
    1/4

    Since kernel version 5.6, there are eight kinds of namespaces and the mount namespace is one of them.

  • With the mount namespace, you can let the process believe that it has access to all directories on the host when in fact it has not.
    2/4

    With the mount namespace, you can let the process believe that it has access to all directories on the host when in fact it has not.

  • The mount namespace is designed to isolate resources — in this case, the filesystem.
    3/4

    The mount namespace is designed to isolate resources — in this case, the filesystem.

  • Each process can see the same file system, while still being isolated from the others.
    4/4

    Each process can see the same file system, while still being isolated from the others.

If you need a refresher on cgroups and namespaces, here's an excellent blog post diving into some of the technical details.

On Kubernetes, a container provides all of those forms of isolation except network isolation.

Instead, network isolation happens at the pod level.

In other words, each container in a pod will have its filesystem, process table, etc., but all of them will share the same network namespace.

Let's play around with a straightforward multi-pod container to get a better idea of how it works.

pod-multiple-containers.yaml

apiVersion: v1
kind: Pod
metadata:
  name: podtest
spec:
  containers:
    - name: c1
      image: busybox
      command: ['sleep', '5000']
      volumeMounts:
        - name: shared
          mountPath: /shared
    - name: c2
      image: busybox
      command: ['sleep', '5000']
      volumeMounts:
        - name: shared
          mountPath: /shared
  volumes:
    - name: shared
      emptyDir: {}

Breaking that down a bit:

You can see that the volume is mounted on the first container by using kubectl exec:

bash

kubectl exec -it podtest --container c1 -- sh

The command attached a terminal session to the container c1 in the podtest pod.

The --container option for kubectl exec is often abbreviated -c.

You can inspect the volumes attached to c1 with:

c1@podtest

mount | grep shared
/dev/vda1 on /shared type ext4 (rw,relatime)

As you can see, a volume is mounted on /shared — it's the shared volume we created earlier.

Now let's create some files:

c1@podtest

echo "foo" > /tmp/foo
echo "bar" > /shared/bar

Let's check the same files from the second container.

First connect to it with:

bash

kubectl exec -it podtest --container c2 -- sh

c2@podtest

cat /shared/bar
bar
cat /tmp/foo
cat: can't open '/tmp/foo': No such file or directory

As you can see, the file created in the shared directory is available on both containers, but the file in /tmp isn't.

This is because other than volume, the containers' filesystems are entirely isolated from each other.

Now let's take a look at networking and process isolation.

A good way of seeing how the network is set up is to use the command ip link, which shows the Linux system's network devices.

Let's execute the command in the first container:

bash

kubectl exec -it podtest -c c1 -- ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
178: eth0@if179: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
    link/ether 46:4c:58:6c:da:37 brd ff:ff:ff:ff:ff:ff

And now the same command in the other:

bash

kubectl exec -it podtest -c c2 -- ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
178: eth0@if179: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
    link/ether 46:4c:58:6c:da:37 brd ff:ff:ff:ff:ff:ff

You can see that both containers have:

Since MAC addresses are supposed to be globally unique, this is a clear indication that the pods share the same device.

Now let's see network sharing in action!

Let's connect to the first container with:

bash

kubectl exec -it podtest -c c1 -- sh

Start a very simple network listener with nc:

c1@podtest

nc -lk -p 5000 127.0.0.1 -e 'date'

The command starts a listener on localhost on port 5000 and prints the date command to any connected TCP client.

Can the second container connect to it?

Open a terminal in the second container with:

bash

kubectl exec -it podtest -c c2 -- sh

Now you can verify that the second container can connect to the network listener, but cannot see the nc process:

c2@podtest

telnet localhost 5000
Connected to localhost
Sun Nov 29 00:57:37 UTC 2020
Connection closed by foreign host

ps aux
PID   USER     TIME  COMMAND
    1 root      0:00 sleep 5000
   73 root      0:00 sh
   81 root      0:00 ps aux

Connecting over telnet, you can see the output of date, which proves that the nc listener is working, but ps aux (which shows all processes on the container) doesn't show nc at all.

This is because containers within a pod have process isolation but not network isolation.

This explains how the Ambassador Pattern works:

  1. Since all containers share the same network namespace, a single container can listen to all connections — even external ones.
  2. The rest of the containers only accept connections from localhost — rejecting any external connection.

The container that receives external traffic is the Ambassador, hence the name of the pattern.

The ambassador pattern in a Pod

One crucial thing to remember, though: because the network namespace is shared, multiple containers in a pod can't listen on the same port!

Let's have a look at some other use cases for multi-container pods.

Exposing metrics with a standard interface

Let's say you've standardized on using Prometheus for monitoring all of the services in your Kubernetes cluster, but you're using some applications that don't natively export Prometheus metrics (for example, Elasticsearch).

Can you add Prometheus metrics to your pods without altering your application code?

Indeed you can, using the Adapter Pattern.

For the Elasticsearch example, let's add an "exporter" container to the pod that exposes various Elasticsearch metrics in the Prometheus format.

This will be easy, because there's an open-source exporter for Elasticsearch (you'll also need to add the relevant port to the Service):

es-prometheus.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: elasticsearch
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: elasticsearch
  template:
    metadata:
      labels:
        app.kubernetes.io/name: elasticsearch
    spec:
      containers:
        - name: elasticsearch
          image: elasticsearch:7.9.3
          env:
            - name: discovery.type
              value: single-node
          ports:
            - name: http
              containerPort: 9200
        - name: prometheus-exporter
          image: justwatch/elasticsearch_exporter:1.1.0
          args:
            - '--es.uri=http://localhost:9200'
          ports:
            - name: http-prometheus
              containerPort: 9114
---
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
spec:
  selector:
    app.kubernetes.io/name: elasticsearch
  ports:
    - name: http
      port: 9200
      targetPort: http
    - name: http-prometheus
      port: 9114
      targetPort: http-prometheus

Once this has been applied, you can find the metrics exposed on port 9114:

bash

kubectl run -it --rm --image=curlimages/curl curl \
  -- curl -s elasticsearch:9114/metrics | head
# HELP elasticsearch_breakers_estimated_size_bytes Estimated size in bytes of breaker
# TYPE elasticsearch_breakers_estimated_size_bytes gauge
elasticsearch_breakers_estimated_size_bytes{breaker="accounting",name="elasticsearch-ss86j"} 0
elasticsearch_breakers_estimated_size_bytes{breaker="fielddata",name="elasticsearch-ss86j"} 0
elasticsearch_breakers_estimated_size_bytes{breaker="in_flight_requests",name="elasticsearch-ss86j"} 0
elasticsearch_breakers_estimated_size_bytes{breaker="model_inference",name="elasticsearch-ss86j"} 0
elasticsearch_breakers_estimated_size_bytes{breaker="parent",name="elasticsearch-ss86j"} 1.61106136e+08
elasticsearch_breakers_estimated_size_bytes{breaker="request",name="elasticsearch-ss86j"} 16440
# HELP elasticsearch_breakers_limit_size_bytes Limit size in bytes for breaker
# TYPE elasticsearch_breakers_limit_size_bytes gauge

Once again, you've been able to alter your application's behaviour without actually changing your code or your container images.

You've exposed standardized Prometheus metrics that can be consumed by cluster-wide tools (like the Prometheus Operator), and have thus achieved a good separation of concerns between the application and the underlying infrastructure.

Tailing logs

Next, let's take a look at the Sidecar Pattern, where you add a container to a pod that enhances an application in some way.

The Sidecar Pattern is pretty general and can apply to all sorts of different use cases (and you'll often hear any containers in a pod past the first referred to as "sidecars").

Let's first explore one of the classic sidecar use cases: a log tailing sidecar.

In a containerized environment, the best practice is to always log to standard out so that logs can be collected and aggregated in a centralized manner.

But many older applications were designed to log to files, and changing that can sometimes be non-trivial.

Adding a log tailing sidecar means you might not have to!

Let's return to Elasticsearch as an example, which is a bit contrived since the Elasticsearch container logs to standard out by default (and it's non-trivial to get it to log to a file).

Here's what the deployment looks like:

sidecar-example.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: elasticsearch
  labels:
    app.kubernetes.io/name: elasticsearch
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: elasticsearch
  template:
    metadata:
      labels:
        app.kubernetes.io/name: elasticsearch
    spec:
      containers:
        - name: elasticsearch
          image: elasticsearch:7.9.3
          env:
            - name: discovery.type
              value: single-node
            - name: path.logs
              value: /var/log/elasticsearch
          volumeMounts:
            - name: logs
              mountPath: /var/log/elasticsearch
            - name: logging-config
              mountPath: /usr/share/elasticsearch/config/log4j2.properties
              subPath: log4j2.properties
              readOnly: true
          ports:
            - name: http
              containerPort: 9200
        - name: logs
          image: alpine:3.12
          command:
            - tail
            - -f
            - /logs/docker-cluster_server.json
          volumeMounts:
            - name: logs
              mountPath: /logs
              readOnly: true
      volumes:
        - name: logging-config
          configMap:
            name: elasticsearch-logging
        - name: logs
          emptyDir: {}

The logging configuration file is a separate ConfigMap that's too long to include here.

Both containers share a common volume named logs.

The Elasticsearch container writes logs to that volume, while the logs container just reads from the appropriate file and outputs it to standard out.

You can retrieve the log stream by specifying the appropriate container with kubectl logs:

bash

kubectl logs elasticsearch-6f88d74475-jxdhl logs | head
{
  "type": "server",
  "timestamp": "2020-11-29T23:01:42,849Z",
  "level": "INFO",
  "component": "o.e.n.Node",
  "cluster.name": "docker-cluster",
  "node.name": "elasticsearch-6f88d74475-jxdhl",
  "message": "version[7.9.3], pid[7], OS[Linux/5.4.0-52-generic/amd64], JVM"
}
{
  "type": "server",
  "timestamp": "2020-11-29T23:01:42,855Z",
  "level": "INFO",
  "component": "o.e.n.Node",
  "cluster.name": "docker-cluster",
  "node.name": "elasticsearch-6f88d74475-jxdhl",
  "message": "JVM home [/usr/share/elasticsearch/jdk]"
}
{
  "type": "server",
  "timestamp": "2020-11-29T23:01:42,856Z",
  "level": "INFO",
  "component": "o.e.n.Node",
  "cluster.name": "docker-cluster",
  "node.name": "elasticsearch-6f88d74475-jxdhl",
  "message": "JVM arguments […]"
}

The great thing about using a sidecar is that streaming to standard out isn't the only option.

If you needed to switch to a customized log aggregation service, you could just change the sidecar container without altering anything else about your application.

Other examples of sidecars

There are many use cases for sidecars; a logging container is only one (straightforward) example.

Here are some other use cases you might encounter in the wild:

Preparing for a pod to run

All of the examples of multi-container pods this post has gone over so far involve several containers running simultaneously.

Kubernetes also provides the ability to run Init Containers, which are containers that run to completion before the "normal" containers start.

This allows you to run an initialization script before your pod starts in earnest.

Why would you want your preparation to run in a separate container, instead of (for instance) adding some initialization to your container's entrypoint script?

Let's look to Elasticsearch for a real-world example.

The Elasticsearch docs recommending setting the vm.max_map_count sysctl setting in production-ready deployments.

This is problematic in containerized environments since there's no container-level isolation for sysctls and any changes have to happen on the node level.

How can you handle this in cases where you can't customize the Kubernetes nodes?

One way would be to run Elasticsearch in a privileged container, which would give Elasticsearch the ability to change system settings on its host node, and alter the entrypoint script to add the sysctls.

But this would be extremely dangerous from a security perspective!

If the Elasticsearch service were ever compromised, an attacker would have root access to its host node.

You can use an init container to mitigate this risk somewhat:

init-es.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: elasticsearch
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: elasticsearch
  template:
    metadata:
      labels:
        app.kubernetes.io/name: elasticsearch
    spec:
      initContainers:
        - name: update-sysctl
          image: alpine:3.12
          command: ['/bin/sh']
          args:
            - -c
            - |
              sysctl -w vm.max_map_count=262144
          securityContext:
            privileged: true
      containers:
        - name: elasticsearch
          image: elasticsearch:7.9.3
          env:
            - name: discovery.type
              value: single-node
          ports:
            - name: http
              containerPort: 9200

The pod sets the sysctl in a privileged init container, after which the Elasticsearch container starts as expected.

You're still using a privileged container, which isn't ideal, but at least it's extremely minimal and short-lived, so the attack surface is much lower.

This is the approach recommended by the Elastic Cloud Operator.

Using a privileged init container to prepare a node for running a pod is a fairly common pattern.

For instance, Istio uses init containers to set up iptables rules every time a pod runs.

Another reason to use an init container is to prepare the pod's filesystem in some way.

One common use case is secrets management.

Another init container use case

If you're using something like HashicCorp Vault for secrets management instead of Kubernetes secrets, you can retrieve secrets in an init container and persist them to a shared emptyDir volume.

It might look something like this:

init-secrets.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels:
    app.kubernetes.io/name: myapp
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: myapp
  template:
    metadata:
      labels:
        app.kubernetes.io/name: myapp
    spec:
      initContainers:
        - name: get-secret
          image: vault
          volumeMounts:
            - name: secrets
              mountPath: /secrets
          command: ['/bin/sh']
          args:
            - -c
            - |
              vault read secret/my-secret > /secrets/my-secret
      containers:
        - name: myapp
          image: myapp
          volumeMounts:
            - name: secrets
              mountPath: /secrets
      volumes:
        - name: secrets
          emptyDir: {}

Now the secret/my-secret secret will be available on the filesystem for the myapp container.

This is the basic idea of how systems like the Vault Agent Sidecar Injector work. However, they're quite a bit more sophisticated in practice (combining mutating webhooks, init containers, and sidecars to hide most of the complexity).

Even more init container use cases

Here are some other reasons you might want to use an init container:

Summary

This post covered quite a lot of ground, so here's a table of some multi-container patterns and when you might want to use them:

Use CaseAmbassador PatternAdapter PatternSidecar PatternInit Pattern
Encrypt and/or authenticate incoming requests
Connect to external resources over a secure tunnel
Expose metrics in a standardized format (e.g. Prometheus)
Stream logs from a file to a log aggregator
Add a local Redis cache to your pod
Monitor and live-reload ConfigMaps
Inject secrets from Vault into your application
Change node-level settings with a privileged container
Retrieve files from S3 before your application starts

Be sure to read the official documentation and the original container design pattern paper if you want to dig deeper into this subject.

Your source for Kubernetes news

You are in!