Karpenter — AutoScaling and Right-Sizing EKS Nodes

Published in

Summit Technology Group

8 min readDec 31, 2022

Be efficient and only use the space that you need

Intro

Kubernetes is a great orchestration engine on which to run your containerized workloads. It allows dynamic scaling up and down of pods (your running application) in order to balance availability and use only the resources that you need.

As an engineer, you define your application pod to require 1 CPU and 1Gi of memory, and to create more pods if the CPU goes beyond 70%. Conversely, you tell Kubernetes to scale down the number of application pods when traffic decreases so that you can save CPU and memory. Great, you’re following the true nature of using the Cloud: use only the resources that you need!

But wait…

What are the underlying resources that power Kubernetes? Virtual Machines, or in AWS world, EC2 instances. How are you “right-sizing” your EC2 instances (called nodes in Kubernetes speak)? You could be efficiently scaling up and down your application pods, but if you’re using 5 nodes, each with 48 CPUs and 192Gi memory, in order to run a few pods that require 1 CPU and 1Gi of memory, then you’re not really fully taking advantage of the elasticity of AWS and Kubernetes.

This is the classic problem that has plagued system admins for decades: how to walk the fine line between under and over-provisioning resources, without sacrificing availability.

Here at Summit Technology Group, we’ve leveraged Karpenter on AWS EKS (Elastic Kubernetes Service) to handle node autoscaling and node right-sizing for us.

Previous solution for autoscaling nodes

Using the cluster-autoscaler on EKS

Node autoscaling in EKS is not enabled by default. To enable it you have to deploy the cluster-autoscaler deployment, which looks for pods with a unschedulable status. It will add the appropriate amount of new nodes to the nodegroup to accommodate the unschedulable pods. Conversely, it will scale down underutilized nodes, and transfer their running pods to the remaining nodes.

The cluster-autoscaler requires you to have an AWS AutoScaling Group (and Launch Template) to define and manage your Kubernetes nodes. The cluster-autoscaler then asks the AutoScaling Group to increase/decrease the number of nodes.

Some problems with this

Because cluster-autoscaler relies on the AutoScaling Group, it could take anywhere from 30 seconds to 60 seconds to detect unschedulable pods and tell the ASG to launch a new node. This mixed with the fact that it could then take AWS another 80 seconds to fully bring the node up, we need to try to reduce these initial 30s to 60s if possible.
Engineers need to take an educated guess on which instance family and size will be the best “fit” in terms of CPU and memory, for their workloads.
The need to create a lot of launch templates and nodegroups in order to get a wide breadth of instance types, capacity (on-demand vs spot), and other configurations such as availability zones.

Karpenter!

Note: as of writing (December 28, 2022), Karpenter is only available for workloads that run on AWS. However, there are discussions about bringing it to other cloud providers such as GCP and Azure.

Karpenter is an open-source, free-to-use tool that runs in your EKS cluster as a deployment, and it acts as a node autoscaler, a node right-sizer, and a way to define nodegroups as Kubernetes resources.

Benefits over the cluster-autoscaler

A nodegroup is defined in Kubernetes as a Provisioner (a CRD), and thus no more need for AWS AutoScaling Groups and Launch Templates. Due to Karpenter directly managing nodes, a new Node is provisioned instantly in AWS (still with the same 80 seconds startup time), when pods are detected to be in an unschedulable state.
Nodegroups can be defined to use a wide range of instances. We can even tell Karpenter to consider all instance families (m6i, c5, etc) and sizes (medium, 4xlarge, etc) to choose from when provisioning a new node. This allows Karpenter to more accurately choose nodes to create.
Easier declaration of on-demand / spot, and also greater chances of immediately getting a spot instance due to the wider range of instances that it considers.
Node consolidation. When under-utilized nodes are detected, Karpenter not only moves the pods and downsizes those nodes, but it will also evaluates if the remaining nodes could be better provisioned. If that’s the case, it will launch new “right-sized” nodes, transfer the pods, and then terminate the old nodes.

Let’s see some code!

Provision an initial nodegroup for Karpenter pods

So if Karpenter is a deployment running in Kubernetes that creates Kubernetes nodes, how do the Karpenter pods initially run if there are no nodes?

To handle this, a small nodegroup is required that is meant only to run Karpenter pods, and other critical kube-system and daemonset pods. At Summit, we provision a startup nodegroup that contains two t3.medium nodes, with ON_DEMAND capacity to ensure no interruptions. These nodes perfectly fit both Karpenter pods that request 500m CPU and 1Gi memory.

Note — it is recommended to run at least 2 startup nodes in a production environment, to spread Karpenter across multiple AZs.

Creating the startup nodegroup

We create a taint on the startup nodes so that only Karpenter pods, daemonsets, and kube-system pods that tolerate the taint can run on these nodes. If we don’t do this, we run the risk of another unwanted pod running on the node which could potentially create a situation where a Karpenter pod can’t be scheduled.

Installing Karpenter in Kubernetes

It could be outlined step-by-step here exactly how to deploy Karpenter, but the Karpenter “Getting Started” docs are very easy to follow, more specifically the “Getting Started with Terraform” one.

One tweak that needs to be made in the Karpenter helm release values is adding the nodeSelector and tolerations to properly run on the tainted startup nodegroup. In the chart’s values.yaml, add:

At this point, our Kubernetes cluster now looks like this:

Creating a Provisioner

A Provisioner is a CRD that represents a Karpenter-managed nodegroup. It allows us to configure a lot of things (Provisioner object documentation), from taints to kubelet configurations, to limits. But the properties that we found most useful revolve around node re-sizing, and node instance types.

At Summit Technology Group, we have different nodegroups for different workloads. For example, a nodegroup for app pods, one for batch jobs, one for CI/CD runners, etc.

Here’s an example Provisioner for an app nodegroup:

Explaining it a bit:

consolidation — if true, Karpenter will remove under-utilized nodes, and also evaluate remaining nodes to see if there are other instance types that would better fit the current pods. If so, create those new nodes, move pods there, and remove the old nodes.
limits — specify the max number of CPUs (and/or memory) for the nodegroup. Important to note here that the max is not a number of nodes, like it is in normal AWS AutoScaling Groups, but a max number of CPUs (or memory) across all nodes in the nodegroup. Also, that there is currently no option to have a minimum number of cpu/memory.
provider — define the launch template for this nodegroup. In this example, it uses the launch template of our startup nodegroup, and uses the security group and subnets that have the karpenter tag. This nodegroup may be using the startup launch template, but everything configured in this Provisioner will override the equivalent property in the launch template. Note — this article was written using Karpenter 0.19.3, however, 0.21.0 introduced defining the Launch Template as a Karpenter CRD in the form of AWSNodeTemplate.
requirements — this is where we can define exactly the types of instances that Karpenter can choose from. There are a lot more options in the docs other than the ones listed above. If left blank, Karpenter will consider all EC2 instance types when choosing which type to provision.
taints — optional, but useful for us, as we want to limit only certain workloads to this nodegroup

Now, our Kubernetes cluster looks like this:

Nice! Now we have a right-sized, scalable nodegroup that is constantly being monitored by Karpenter, and modified quickly to add or right-size compute.

Karpenter at Summit Technology Group

The migration to Karpenter here at Summit Technology group has mostly been positive, but there were a few things that we needed to solve for. Let’s go over the positives first.

Benefits of Karpenter at Summit

The ability to not have to “guesstimate” which EC2 instance sizes would be best for our workloads.
Saving money on our AWS bill. Because we allow Karpenter to right-size for us, we found that we were over-provisioning our nodes, sometimes drastically over-provisioning.
Easier adaptation for running spot instances. Karpenter gave us the push to finally switch to spot instances where applicable in lower environments. This is due to the increased confidence that we’ll be provisioned a spot instance due to the wide range of instance types that Karpenter can choose from. Spot instances also helped us reduce our bill.

Things that we had to solve for

Because the nodegroups are so right-sized, a helm upgrade of an app with multiple pod replicas often triggered the creation of a new node. And then when the upgrade is finished and the old pods are removed, then the new pods would transfer to the original node, and the new node removed. This caused a “flapping” effect that was remedied by setting consolidation to false for the app nodegroup that experiences lots of helm rollouts. Essentially, now we have one under-utilized node that can accommodate helm upgrades and pod autoscaling.
Treating Karpenter as an extremely critical resource. Because they directly manage nodes, we need to ensure maximum uptime of those pods. We accounted for this by running multiple Karpenter pods across 2 nodes in 2 AZs, with both nodes tainted for only Karpenter pods. We also introduced various monitoring and alerting around Karpenter activity.

Wrapping up…

If you’re running Kubernetes on AWS, Karpenter offers a smarter way to do node autoscaling, is quite easy to set up, and will ultimately save you money on your AWS bill.

As mentioned before, it’s highly recommended to check out the Karpenter docs (make sure you’re on the appropriate version — as of writing, we reference 0.19.3) where you’ll find much more information than what we were able to provide here in this article.

Here at Summit Technology Group, where we run highly trafficked workloads on multiple AWS environments running Kubernetes, we’re benefiting from using Karpenter, as it allows us to walk the line between using only what we need without sacrificing availability.

Interested in working with AWS, Kubernetes, full stack engineering, or Data Engineering in a fast-paced environment? Go check out our open positions at thesummitgrp.com!