Setting up your first EKS cluster on AWS: some practical tips

Benjamin Christmann
5 min readDec 21, 2022

Why this article?

You may find tons of information online on how to work with EKS, but it’s easy to get lost in the details and lose the big picture.

Also, Kubernetes may seem simple at first sight, but designing and building a secure one comes with many challenges; especially on AWS where the default configuration is rarely the ideal one.

At Miuros, now part of Dixa, we spent quite some time studying how to best migrate from ECS to EKS, prioritizing security and reliability. In this article, we share some of our learnings.

Designing your network

First of all, you should list your requirements and start planning your subnets accordingly.

As an example, here are our requirements:

  • High availability → across 3 zones
  • Private cluster (cannot be accessed externally), which implies:
    — nodes in a private subnet
    — pods in a private subnet
  • Accessibility:
    Internal load balancer (LB) for “admin” endpoints
    — External load balancer (LB) for “customer-facing” endpoints

So assuming 3 availability zones (AZs), the requirements can be fulfilled with:

  • 3 small private subnets (/26 or lower) for internal LB with the tag “kubernetes.io/role/internal-elb” =1
  • 3 small public subnets (/26 or lower) for external LB with the tag “kubernetes.io/role/external-elb” =1
  • 3 private subnets for nodes: estimate the nodes you may need in the coming months to plan enough IPs in the subnet, for small to medium clusters (between a /26 — /24)
  • 3 private subnets for pods.

This last piece is the hardest part because you often end up needing more than you planned and AWS makes lots of reservations behind the scenes ( based on ENI capabilities and Private IPv4 Addresses per Interface).

If you get blocked then you will have to play a lot with your subnets and the custom networking which might be tricky. When possible allow large IP ranges (between /20 — /22 for small to medium clusters).

See more on this topic :
Optimizing EKS networking for scale

AWS EKS subnet splt

Defining your security

In order to secure your flows, you need to list them and define how you want to manage them.

At least 3 flow types must be defined: external world to pods, pod to the external world, and pod to pod.

External world to pods

This can be addressed by:

  • Ingress controller(s): allow to expose multiple services to the external Kubernetes world ( usually Layer 7)
  • Load Balancer: expose 1 service to the external Kubernetes world
  • Direct access to pods (avoid doing it, this is not safe to bind to a direct IP)
External pod acceses

Here the easiest choice is to use and define different ingress controllers (at least two but more can be defined). For security, AWS NLB allows us to expose services easily and can even filter from source ranges to increase security.

You can consider different Ingress Classes to address the different use cases. The goal is to “shadow” part of your network security in your ingress classes:

  • External: public traffic
  • Internal: your internal users
  • Admin: Tech Admin purposes
  • Services: services hosted somewhere else

Then you can apply specific LB rules that would apply to your ingresses instead of playing with complex ingresses rules and definitions.

Pod to the external world

This is supported by default, the tricky part is related to the security group that will be attached to the pod:

  • node security group: everything goes out as the same security group (not the best, you may grant access to a sensitive resource to a flawed pod)
  • pod security groups: each pod may have a different security group
access from the pods

Pod to Pod

Multiple choices exist to secure this connection:

Security Groups:

  • Main advantage: You use the same configuration method for every pod communication
  • Main Drawback: It is difficult to configure it as Kubernetes resources, in a GitOps flow (we do not use crossplane https://crossplane.io/)

Network Policy:

  • Main advantage: fully secured with a nice community and examples
  • Main drawback: It may be difficult to micromanage every rule

Service Mesh (like Linkerd Authorization):

  • Main advantage: ease of configuration ( easier to define and manage)
  • Main drawback: technically, not as many security options ( no egress for example)

In our case, we have chosen the service mesh alternative with Linkerd. Service Mesh has lots of benefits to secure communication and Linkerd Authorization feature is very efficient and quite easy to use (more than network policies in our humble case).

Configuring AWS CNI

AWS CNI is the recommended one to ensure connectivity with other AWS features and security. Others may be found on this page: Alternate compatible CNI plugins

Unfortunately, out-of-the-box CNI is not in its best state …

Here are some recommended settings:

Mandatory:

  • ENABLE_POD_ENI: true → allows you to define security group for pods
  • ENABLE_PREFIX_DELEGATION: true → allows you to use more IPs on your instances
  • DISABLE_TCP_EARLY_DEMUX: “true”

If you do not use security groups for all of your pod-to-pod communications (like us) I suggest using these 2:

  • POD_SECURITY_GROUP_ENFORCING_MODE: “standard”
  • AWS_VPC_K8S_CNI_EXTERNALSNAT: true → required or your pod will communicate as the node IP with other VPC

Here you can find a complete reference: https://github.com/aws/amazon-vpc-cni-k8s

AWS Load Balancer

By default AWS load balancer plugin is not the best one … AWS will recommend its new SIG and you will see lots of references and capabilities that won’t apply to the default one.

You need to manually install and configure this one: https://kubernetes-sigs.github.io/aws-load-balancer-controller/

We prefer to use a combination of traefik ingress controller + AWS NLB but other setups can exist.

In this setup some important annotations on the traefik service are:

  • service.beta.kubernetes.io/aws-load-balancer-type: external
    Allow using this dedicated sig
  • service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    Use an IP connectivity and not an instance
  • service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: “*”
    Important in order to use the proxy protocol v2 and allow traefik to know the real IP address
  • service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: “true”

Allow your services to communicate with external endpoints without having issues

  • service.beta.kubernetes.io/aws-load-balancer-subnets”: xxxxx ( your subnet ids)

Wrapping things up

The topic is broad, take your time to study and test it.

We humbly tried to pinpoint some important facts, but many alternatives can be relevant, depending on your overall stack and requirements.

Here is a resource I found particularly useful: https://aws.github.io/aws-eks-best-practices/networking/index/

Interested in working on similar technical challenges and helping our Engineering team scale our platform? Check out our open positions.

--

--