Fight The Hidden Cost of Regional Kubernetes Clusters — Cross Zonal Egress — Part 1

Animesh Rastogi
Google Cloud - Community
5 min readJun 5, 2023

--

Photo by Kevin Ku on Unsplash

One of the key tenets of SRE and modern Cloud Native architectures is redundancy and high availability. To ensure the best customer experience and adhere to stringent industry SLAs, you need these things built at all the layers of your infrastructure.

If you’re running your applications on a platform like Kubernetes, one easy way to achieve this is by spreading the control plane and the worker nodes across multiple Availability Zones. In Cloud Managed Kubernetes Providers like GKE, it is simply a matter of checking a box during cluster creation time to make your cluster regional(multi-zonal).

However, each cloud provider has cost component of cross zonal traffic. So, basically imagine you have 2 services — Service A and Service B. Service A makes an API call to Service B and receives data.

We have no inherent control over which pod makes the API call and which pod responds to it.

This all is decided by Kubernetes Services as you can see in the figure below:

Intra Cluster Communication

How Kubernetes Services Work?

When you create a Service Object, Kubernetes creates an associated Endpoints object which has a list of all the active and healthy Pods that match the Service’s “selector”. The Endpoints object holds the IP and port number of every Pod represented by a Service. This Endpoints object gets sent over the network to every Node in the cluster. The Endpoints object is used to create local networking rules, etc.

Next, the Kubernetes controller allocates a Cluster IP to the Service, which is a virtual IP and only routable in worker nodes of the Kubernetes cluster. The instance of kube-proxy running on each node via a DaemonSet creates a series of iptables rules. When a traffic is targeted Cluster IP (or enter the cluster via NodePort) iptables routes the packet to a pod based on their load balancing algorithm. For iptables the default load balancing algorithm is random.

How to prioritize intra-zonal communication?

There are 2 ways to implement something like this. Which way you choose depends on your Kubernetes adoption, skillset available in your company, your workload type and the traffic patterns of your services.

  1. Topology Aware Routing
  2. Using Istio’s Locality Load Balancing

In this part, we shall cover topology aware routing, what is it, when should you use it and how to enable this in your cluster

What is Topology Aware Routing?

Introduced in Kubernetes v1.21, topology aware routing enables an alternative mechanism for traffic routing by putting a subset of endpoints to kube-proxy.

TAR depends on EndpointSlices controller, which is a more extensible alternative to Endpoints. Endpoints on Endpoints controller only contains pod IP address and ports, while in EndpointSlices it can be set with hints. Hints are additional labels on each endpoint. TAR sets a hint of which zone the endpoint resides.

When a hint is set, kube-proxy filters the endpoints based on the hints. In most cases, it chooses endpoints in the same zone. When traffic enters kube-proxy, it’s routed only to pods in the same zone which is free of traffic costs.

When should it be used?

  1. When Incoming Traffic is evenly distributed — If a large amount of requests are originating from a single zone, it can easily overwhelm the pods in that zone
  2. When there are enough endpoints available per zone — The general rule of thumb is No of endpoints = 3*no of zones

How to enable it?

Enabling Topology Aware routing is very simple. You need to ensure topology.kubernetes.io/zone & topology.kubernetes.io/region are set on all nodes. IF you’re using a managed Kubernetes providers like GKE, these would be set already.

Next, you need to to annotate your service with service.kubernetes.io/topology-mode: auto annotation. To confirm if topology aware routing has been enabled for this service, run the command

kubectl get endpointslice <<name>> -o yaml

You should see something like

addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
- 10.28.1.52
conditions:
ready: true
serving: true
terminating: false
hints:
forZones:
- name: us-central1-c

There you go. TAR is now enabled in your cluster.

Considerations and Safeguards:

Although TAR is a powerful mechanism to control traffic routing within a Kubernetes cluster, it also has the potential to negatively impact workloads. Several safeguards have been built in to the feature that’re important to consider. In some cases, these safeguards can negate the use of the feature.

One key safeguard is the minimum number of endpoints for each zone. This has been implemented to ensure that if any given zone has more proportional capacity than the other zones, they must have similarly proportioned number of endpoints. This is intended to ensure that a zone is not overwhelmed by traffic.

Expected ratio = sum of vCPU of nodes this zone / sum of vCPU cores of all nodes in cluster.

OverloadThreshold is a constant of 20%.

Minimum endpoints = Total number of endpoints * Expected ratio / (1 + OverloadThreshold), round to 1 (ceiling).

Conclusion

In this part, we covered how Kubernetes services work and how we can manipulate traffic using Kubernetes’ native feature of topology aware routing to ensure that traffic doesn’t cross zonal boundaries as much as possible. However, we also see that there are many considerations and safeguards kept in place by the Kubernetes controller to ensure your workloads aren’t negatively affected by this feature.

In the next part, we shall cover how this process is made much simpler when using a modern service mesh like Istio.

See you soon !!!

References:

Topology Aware Routinghttps://kubernetes.io › concepts › services-networking

KEP: Topology Aware Hints — kubernetes/enhancementsGitHubhttps://github.com › keps › sig-network › README

https://kubernetes.io › concepts › services-networking

https://aws.amazon.com/blogs/containers/exploring-the-effect-of-topology-aware-hints-on-network-traffic-in-amazon-elastic-kubernetes-service/

--

--