On Amazon EKS and Cost Optimisation

17 min readDec 21, 2022

Optimising for cost in a dynamic and fast-paced cloud-native environment is not always obvious: AWS services improve, prices and purchase options change, Kubernetes itself evolves, and other projects within the CNCF landscape frequently create new possibilities and tools that drive cost optimisation forward. Optimising for cost is not a one-time effort as a result.

Cost optimisation is a continuous activity requiring practitioners to stay current within a fast-changing environment of AWS options and Kubernetes features.

The practice of cost optimisation is reflected as part of the financial operations or FinOps stream, an evolving cloud financial management discipline aiming to maximise business value by helping engineering, finance, technology, and business teams collaborate on data-driven spending decisions. This blog focuses on practicalities and considerations for cost optimisation of Amazon EKS environments.

For those on a tight time budget: The TL;DR of the following sections is to show which cost optimisation approaches we came to value, such as using Amazon EC2 instance purchase options, introducing Karpenter for node auto-scaling, leveraging AWS Graviton, adopting cloud-native storage, and optimising Kubernetes application configuration.

Let’s step through 7 levers that can be useful when optimising the cost of Amazon EKS clusters and supporting AWS services.

1. Compute Purchase Options: Understanding purchase options for Amazon EC2 is foundational to cost optimisation. The Amazon EKS data plane consists of worker nodes or serverless compute resources that carry the Kubernetes application workloads. The data plane can run on several capacity types and purchase options: On-Demand, Spot Instances, Savings Plans, and Reserved Instances.

Both EC2 On-Demand or Spot capacity can be used without making spending commitments. On-Demand is billed by run-time and provides guaranteed instances billed at On-Demand rates of the instance type. Spot instances are billed at discounted rates in exchange for being preemptable and not guaranteed. Both capacity types can be very effective when dealing with temporary, short-lived or bursty workloads. Spot can be particularly cost-effective when used with applications that tolerate fluctuating compute availability.

Then we have compute purchase options where we make an up-front spend commitment over one or three years in exchange for discounted rates. These are very effective once the “steady-state” or baseline resource consumption profile is established against which a reservation makes sense. AWS launched the newer Savings Plans to be more flexible than Reserved Instances, resulting in less risk of over-allocating compute resources or being committed to using a particular instance type that is no longer an optimal fit. With a Savings Plan, you commit to a “US Dollar spend amount” and pay that amount irrespective of having resources provisioned. Today, Savings Plans come in two flavours: Compute Savings Plans and EC2 Instance Savings Plans. The Compute Savings Plans are more flexible, as any provisioned instance type, as well as AWS Fargate and AWS Lambda charges, count against the committed value of the plan. But you can drop some of that flexibility in exchange for deeper discounts when opting for an EC2 Instance Savings Plan, where you narrow down the compute choice to an Instance Family.

Viewed in a timeline, the Reserved Instance payment options have been available for a long time, and Savings Plans are a more recent alternative. The below table summarises some of the highlights.

Observe the pairing between “Standard RI & EC2 Instance Savings Plan” and “Convertible RI & Compute Savings Plan”: The reference discounts are the same, but with Savings Plans, we achieve improved flexibility at the same discounted rate.

As the purchase plans run for 1 or 3 years, cost optimisation practitioners generally lean towards Savings Plans to accommodate changing circumstances.

Compute Savings Plans work particularly well when the fleet composition is frequently changed and optimised. This is important when using intelligent node auto-scalers that leverage various EC2 instance type families and sizes. Reserved Instances will not efficiently keep up with dynamic fleets, and nodes may fall out of coverage and risk being changed at on-demand rates.

When determining an appropriate commitment value of a Savings Plan, use the Savings Plan recommendation engine. The recommendations are based on historical usage over a configurable lookback period and can be provided for AWS Accounts across an AWS Organisation or an individual AWS Account.

All Plans are purchased on an AWS Account basis; therefore, having AWS Accounts be part of AWS Organisations helps share unused discounts with other member accounts.

2. Flexible Compute: A cost optimisation strategy for Amazon EKS will invariably require a worker node auto-scaler unless the entire application workload can fit into AWS Fargate micro-VMs and comply with prerequisites and configuration requirements. In many cases, the compute composition of the Amazon EKS data plane will contain a degree of EC2-based worker nodes; hence introducing Karpenter as an intelligent and application-centric node auto-scaler can create the requisite cost optimisation capability to leverage flexible compute options across On-Demand and Spot.

Karpenter implements the concept of Provisioners that can be configured to use flexible compute options by defining capacity types: On-Demand and Spot. Once the search for an optimised instance type and size is completed, the nodes are provisioned with the configured capacity type.

Karpenter can also leverage Savings Plans by configuring multiple Provisioners. Each Provisioner can be associated with a weight, which Karpenter uses as part of a decision tree to determine which Provisioner to use when scaling nodes. Each Provisioner can be configured to reflect a reservation pool: For example, Provisioner-1 would hold a pool of Reserved Instances and be constrained to provision nodes that fit the Reservation instance types, Provisioner-2 would hold a pool of Savings Plans resources, and Provisioner-3 would perhaps have the smallest weight and provision Spot instance capacity.

Karpenter can leverage flexible compute options in combination with Savings Plans: This is not an exact science at the moment, as Karpenter does not yet directly identify available Reservations and Savings Plans in a given Account. Still, Karpenter Provisioners allow for a practical implementation that approximates the desired behaviour.

As a sidebar: Remember that Spot spending does not count against committed reservations or Savings Plans. You cannot double-dip by benefiting from Spot rates and then have them rediscounted by applying a Reservation or Savings Plan. Therefore use the On-Demand capacity type in combination with Savings Plans.

Equally, Karpenter Provisioners can act fast and reduce node spin-up times. Workloads that depend on fast node spin-up times during scaling events have often resulted in over-provisioning the worker nodes. An approach to help applications scale quickly has been to over-allocate resources that can be used directly, as they are already present: Thus avoiding the time delay of waiting for new nodes to spin up and join the cluster. Projects such as the cluster over-provisioner may have been used to achieve that. Karpenter node scaling may now be sufficiently fast to avoid over-provisioning, and the overall cluster can run at higher utilisation rates and reduce holding on to spare capacity. Fast Amazon EC2 spin-up times contribute to keeping running costs optimised for longer.

3. Right Sizing Compute: Determining and provisioning right-sized Kubernetes worker nodes can be challenging with node auto-scalers using Managed Node Groups (MNG). Optimising for cost with MNGs has improved over time, such as providing support for scaling up from and back down to zero. However, one of the main challenges is having to pre-configure MNGs per instance type and then selecting a node group during scaling events. Using Karpenter as a node auto-scaler turns that process around: Rather than pre-configuring available instance types, Karpenter can choose right-sized worker nodes from a broad array of instance and capacity types to identify the best match. Rather than choosing from a narrow pre-configured list of instance types by managed node group, Karpenter uses the EC2 Fleet API directly and identifies the candidate list of right-sized instances across instance types that satisfy configured Provisioner constraints. EC2 Fleet selects the “winning” instance type and size from the candidate list based on a selection or allocation strategy.

EC2 Fleet always selects the least expensive instance type based on the public On-Demand price for the On-Demand capacity type, which is the lowest-price allocation strategy. For the Spot capacity type, EC2 Fleet supports the following allocation strategies: price-capacity-optimized, capacity-optimized, diversified, or lowest-price. Karpenter today defaults to price-capacity-optimized: This tells EC2 Fleet to identify the instance type that EC2 has the most capacity for while also considering the price. This is important for Spot, as this strategy balances price with interruption probability.

Karpenter can increase node resource utilisation by choosing node sizes that tightly align with incoming resource requests. Unschedulable pods are batched within a time window and then bin-packed based on the aggregate requested CPU, memory, and GPU required to determine a best-fitting instance size.

As a sidebar: The right-sizing approach is based on Pod manifest resource requests (not the limits), which is the case for all Kubernetes scheduling decisions. This is great for Pods with the guaranteed QOS class, which Kubernetes assigns to Pods with matching resource requests and limits. Pods with containers that have different values for requests and limits are assigned the bursting QoS class. These Pods can burst into unused resources of a given node until they reach their configured limit. Configuring Pods with large bursting allowances can exhaust right-sized nodes, triggering Pod terminations and evictions.

Configuring limited bursting ranges and scaling Pods horizontally generally helps reduce eviction probabilities. Defining a minimum node size for Karpenter Provisioners can also be effective when working with bursting Pods.

4. Consolidating Compute: Kubernetes assigns Pods to Nodes during scheduling: The Kube-Scheduler watches for newly created Pods without Node assignment and then finds the best Node for that Pod to run on. Once the placement is complete and the Pods run on a Node, Kubernetes will not change that allocation unless eviction or preemption situations arise. But as worker node availability and composition typically change over time, circumstances can emerge in which Pods are no longer ideally distributed across nodes. This can have a range of results, including nodes not being identified as unneeded by the node auto-scaler, as there is no native Kubernetes mechanism that checks for pod-reallocation opportunities to optimise for node resources or cost.

The idea behind consolidation is to enable a cluster to recurrently review and improve Pod distribution by consolidating Pods into fewer nodes and using better-priced nodes. Consolidation actions can take the form of node removal or replacement.

Karpenter Consolidation can remove a node outright if all of its pods can run on the spare capacity of other nodes in the cluster. Node replacement happens by moving Pods to nodes with excess capacity and placing the remaining Pods onto a single better-priced smaller replacement node.

The Karpenter Consolidation features advance the cost-optimisation objective by identifying node replacement or “node shrinking” options based on pricing data and bin-packing opportunities for Pods. Karpenter is uniquely placed to understand price as a variable: It integrates with the AWS Price List API to obtain current instance price data, which in turn is used to drive the consolidation arithmetic.

As a sidebar: Enabling the Consolidation workflow creates perturbations in the availability as Pods are being moved and re-scheduled onto different nodes. The composition of the running worker node fleet can frequently change as cost improvements are automatically implemented with every polling period. This is where the fundamental tension between driving cost optimisation and application availability emerges again. This tension can be navigated by deciding how and where Consolidation should be active.

The behaviour of the Consolidation feature can be adjusted in various ways. For example, using attribute-based instance selectors such as karpenter.k8s.aws/instance-cpu can help avoid “over-optimisation conditions” with too tightly right-sized nodes, which can result in continued node churn. Defining a minimum node size can help the Karpenter Consolidation feature converge and stabilise faster.

Another area that can reduce the impact on availability is the container image size: Compact image sizes enable faster pulls and Pods become available faster on their new target nodes. Defining Pod Disruption Budgets (PDB) can dampen the speed at which Consolidation takes place. PDBs define the minimum number of Pods maintained during disruptions created by voluntary rescheduling actions and evictions. For example, if a Deployment has a desired set of 5 replicas and a PDB of 4, then 1 Pod can be moved at a time, as opposed to moving them all at once.

Workloads can also be excluded from the Consolidation workflows. The Consolidation feature can be enabled or disabled by Provisioner, which provides a choice of where consolidation is applied. Assigning the karpenter.sh/do-not-consolidate annotation to a node can also exclude specific nodes. This establishes node tiers and applications that can’t sustain consolidation interruptions can be allocated accordingly. Pods can also be annotated with karpenter.sh/do-not-evict: “true”, resulting in Karpenter not voluntarily removing nodes containing these Pods.

5. Graviton Compute: AWS Graviton instances are based on AWS custom-built silicon using 64-bit Arm Neoverse cores (aarch_64) to deliver the best price-performance for Amazon EC2. The latest generation of AWS Graviton3 processors was recently launched as well, further improving cost-optimisation and sustainability.

One of the paths to AWS Graviton is to adopt AWS-managed services that already have a built-in option for aarch_64. These managed services can be used with Graviton, even when the Kubernetes application itself may still require x86 worker nodes.

AWS-managed database services, for example, can be a great option to commence using Graviton indirectly when the primary Kubernetes application and its database client versions can support them.

For example, an alternative to running Redis directly on Amazon EKS worker nodes would be to use Amazon ElastiCache for Redis instead. In this way, the AWS ElastiCache team would run Redis on Graviton instances on our behalf and expose an end-point for the primary Kubernetes application to connect to. Similarly, Amazon RDS relational database services increasingly offer Graviton-based options.

The next adoption stage would be to directly support Graviton worker nodes, which, depending on the state of the containerised application, may be as simple as replacing x86-based instances with Graviton instances. In many cases, however the primary Kubernetes application would need to be refactored to support this.

Transitioning an application to support Graviton depends on the bytecode-compiled language and can involve many steps. But the motivation to move towards direct Graviton support can be substantial when optimising for cost.

Containerised applications are often compiled to x86_64 binaries, and the container images built with them may be architecture specific. x86_64 images cannot run on Graviton instances based on aarch_64. The path to building applications that can run on both Intel/AMD silicon and Graviton would lead developers to adopt multi-architecture containers, which use an OCI manifest list or image index, that can reference architecture-specific image layers such as x86_64 and aarch_64. Equally, you’d need a container image registry that supports multi-architecture images, such as Amazon ECR, which makes it simpler to deploy container images for different architectures and operating systems from the same image repository. Modifying existing build pipelines to generate multi-arc container images and push them into Amazon ECR is a great way to keep image pulls simple for consumers: They can reference the manifest list name without specifying the correct architecture.

Determining the availability of aarch_64-enabled third parties and dependencies is also part of the transition journey. These software components are not directly under your control and are built and distributed by others.

The below table illustrates the Graviton price advantage on a small selection of instance types and other services.

Illustrating the Graviton price advantage on selected services

6. Storage: An area that can help move cost-optimisation on Amazon EKS forward is related to storage. Container applications often have persistent storage requirements, and container-native storage enables stateful workloads to run within containers by providing persistent volumes. Container-native storage options expose underlying storage services to containers: Akin to software-defined storage, it aggregates and pools storage resources from disparate media.

The Amazon EKS team helps provide various CSI driver options that integrate with AWS native storage services, such as the Amazon EBS CSI Driver, the Amazon EFS CSI Driver, and the Amazon FSx for NetApp ONTAP CSI Driver.

When using Kubernetes StorageClasses backed by Amazon EBS, opting for Amazon EBS GP3 can improve the cost-performance metric by up to 20%. Amazon EFS has various native storage classes with varying cost and performance characteristics. To improve cost, enabling Amazon EFS intelligent tiering can automate the movement of files between the tiers based on their access pattern: Here we exchange lower cost for up to 10x higher latency.

Amazon EFS One Zone storage classes further improve cost by trading off cost with availability. Consider, however, that the EFS CSI driver, at this time, would only mount file systems to Pods if they are replicated into the zone from which they are accessed.

Kubernetes supports a range of Volume types, including, for example, the ephemeral volume type emptyDir, where its data exists as long as that Pod runs on a node. This volume type can be particularly cost-effective for transient application data processed locally on the node. EC2 instances with NVMe disks can further improve performance and cost over other networked storage options.

Specialised cloud-native storage solutions can be implemented as Kubernetes controllers such as OnDat. We’ll deep-dive into these options as part of another blog that covers our learnings and discoveries related to cloud-native storage.

When optimising for the cost of data storage, backup schedules and backup lifecycle management will also play a role. AWS Backup can help define and operate backup needs across several AWS native storage options. Creating AWS Backup Plans defines which resources are backed-up, how often, and how the resulting backup files are transitioned across storage tiers until being ultimately deleted. Configuring AWS Backup Plans for Amazon EBS volumes that are provisioned and controlled by the EBS CSI driver and used by distributed databases running on Kubernetes requires some thought.

Databases deployed on Kubernetes often provide and insist on using “application layer” tooling to handle application-consistent backups. Running block-level backups through AWS Backup Plans may result in failed restore attempts.

7. Kubernetes Applications: A key driver behind AWS resource usage are the Kubernetes applications themselves. Calibrating how applications request resources, understanding how workloads utilise the resources they request and determining an approach to dynamic workload auto-scaling are component parts of building increasingly resource-efficient Kubernetes applications.

Right-sizing application resource requests minimises the overallocation of resources and reduces the allocated but unused capacity of Kubernetes clusters — also known as “slack”. When application resource request values are set too high, utilisation and cost efficiency tend to degrade; if set too low, then the stability of the application may degrade. Hence the relevance of identifying appropriate resource requests for an application component at baseline load. Where practical, benchmarking and profiling can discover application container resource requirements. Various tools can help do this, including cAdvisor. Alternatively, the recommender engine of the Vertical Pod Autoscaler (VPA) can help automate some of the work to arrive at appropriate resource request settings, as the VPA engine computes the recommended resource requests based on historical and current resource usage. Once established, the container resource allocations are configured as resource requests within Kubernetes manifests.

Equipping Kubernetes applications with dynamic scaling features is another aspect of optimising resource allocations. The Kubernetes native Horizontal Pod Autoscaler (HPA) controller can automatically scale workload resources as load indicators vary; cluster resources are allocated when needed. The HPA object can be configured to work as a target tracker, which scales the number of Pod replicas of a workload resource (such as Deployments and StatefulSets) to achieve an objective function. For example, the HPA can be configured to keep the average CPU utilisation across the replicas of a Deployment as close as possible to a target percentage: A new replica Pod is added to the Deployment whenever the average crosses the threshold, and a replica Pod is removed when the average falls below the target. HPA scaling policies are very flexible and can be used to cap the maximum number of replicas and the minimum required number of replicas.

Determining the minimum number of Pods is another example of where the tension between cost optimisation and availability emerges: Setting the minimum to 1 optimises for cost, and setting the minimum to a number greater than 1 begins to exchange cost efficiency with availability.

By default, HPA uses memory and CPU as the dynamic scaling indicators, which are obtained from the Metrics Server. But application load is only sometimes effectively captured with memory or CPU metrics. Depending on the application, defining custom metrics such as queue length or active connections, for example, can be a more significant scaling trigger: Adopting HPA with a flexible controller such as KEDA can help achieve that. KEDA is a CNCF project that we came to value due to its integration options with Prometheus, which can make it relatively easy to reference custom application metrics.

Tracking and quantifying the impact of cost optimisation actions brings us back to the principles of FinOps. Understanding application resource usage and cost at various levels of Amazon EKS workloads help establish the current cost optimisation maturity level and can highlight areas of further optimisation opportunities.

A helpful way to approach this is to envisage two lenses through which we can observe cost and quantify the efficacy of cost improvement actions: The AWS resource-level lens and the Kubernetes application-level lens.

The AWS resource lens: The AWS Cost Management services can be effective entry points for AWS resource level monitoring. The Rightsizing Recommendations feature, for example, automatically reviews historical EC2 usage to identify cost-saving opportunities at EC2 level. The Cost Anomaly Detection feature uses “cost monitors” to track and alert on unexpected cost spikes across various AWS services.

Cost trends can also be manually reviewed and analysed through customisable AWS Cost Explorer dashboards: A great option to view the impact of introducing Karpenter as the EC2 node auto-scaler or seeing the cost savings from pivoting to Graviton-based managed services options. Various predefined Reports are also provided, including Savings Plans utilisation reports and coverage reports.

These reports can be especially useful when tracking the utilisation of layered compute commitments. Starting with a conservative or small Savings Plan can be a sensible starting point. Savings Plans utilisation reports can highlight exhausted Plans and when additional Savings Plans can make sense. Multiple Savings Plans can be active simultaneously, effectively layering them over each other, which helps incrementally build commitments towards the usage baseline.

The AWS Cost Explorer and Report features can also be used to track account-level costs, which provides the aggregated effectiveness of optimisation actions across compute, storage, and other supporting AWS services.

The AWS Billing service can also help produce cost allocation reports that aggregate costs based on cost allocation tags. The AWS Cost Management options are great at establishing total costs and trends by AWS resource types or user-defined tags.

The Kubernetes application lens: What can be challenging with AWS Cost Management options is aggregating cost by Amazon EKS clusters or Kubernetes constructs. Cost monitoring at the level of clusters, Kubernetes Namespaces, Deployments, and Pods enable cost allocation to Departments or teams and can provide essential insights into additional cost optimisation opportunities.

Kubernetes-level cost monitoring can help uncover abandoned or stranded Deployments and Persistent Volumes. Workloads without any incoming or outgoing traffic can be flagged for review. Translating application resource requests into AWS resource costs can identify efficiency potentials, as requested but unused resources can often indicate savings.

Amazon EKS add-ons can help generate this level of application-centric cost visualisation. For example, AWS has partnered with KubeCost, which is built on the CNCF project OpenCost.

KubeCost can be directly deployed into Amazon EKS clusters via the “AWS Marketplace for add-ons” and retrieves public pricing information of AWS services and resources from the AWS Price List API. Interestingly, KubeCost can also integrate with AWS Cost and Usage Reports of specific AWS accounts to include existing discount structures such as Enterprise Discount Programs, Reserved Instances, Savings Plans, and Spot.

Conclusions

Combining the use of Compute Savings Plans with Karpenter node auto-scaling features to keep Amazon EKS clusters cost-optimised over time can be a successful approach. The configuration options of Karpenter allow for utilising available discount plans such as Reserved Instances and Savings Plans and tapping into flexible compute options such as Spot when needed. Further cost optimisation can be achieved by enabling Karpenter’s Consolidation feature that helps with maximising the utilisation of provisioned worker node resources. Karpenter Provisioners can also be configured to provision Graviton instances, which drives additional price-performance advantages for applications and AWS-managed services that support aarch_64. Kubernetes applications themselves can be configured to be cost-efficient and reduce the overall amount of slack in a cluster. Finally, a cost monitoring approach based on AWS native tools and Kubernetes projects such as KubeCost can be used to track the impact of cost optimisation actions over time.

On Amazon EKS and Cost Optimisation

Written by Dirk Michel