Considerations for Hardening your GKE, a workload perceptive

Pavan kumar Bijjala
6 min readOct 14, 2022

We have specific recommendations to harden your cluster from a network and security perspective. Several of these (disabling basic authentication, disabling client certificates, disabling the Kubernetes dashboard, etc) are fundamental and platform level which are now available as defaults for new clusters by Cloud providers like GKE so I am not bringing them here. However, more details on these can be found from the following.

Also clusters created in the Autopilot mode implement many GKE hardening features by default.

The following is a list of security controls and considerations to be implemented (as a default) for GKE clusters based on the recommendations from the workload controllability point of view.

  • Use of network policies to limit communication between namespaces on a cluster. Note: actual configuration will depend on what other namespaces need to be accessed. Default deny configuration examples are provided below.
  • Use of custom service accounts for GKE nodes, with the minimum set of privileges the nodes require to access other GCP services
  • Use of workload identity. Workload identity allows more granular access controls, where Google service accounts per workload instead of the node’s service accounts are used to access workloads
  • Use Pod Security admissions and admission controller to intercept requests to the kube-apiserver after authentication/authorization. A default example policy is provided below.
  • Use of GKE Sandbox for additional layer of security

Network Policies

It is recommended to implement network policies to control how pods can communicate with each other within a cluster.

For example, if there were an attacker that gained access to your front end this could help them from moving laterally within your cluster and gain access to other layers. Network policies are also effective in managing access in the scenario of multi-tenancy, so that pods in one namespace cannot automatically communicate with pods in another namespace.

Network policies have several options for defining ingress and egress rules for communication between pods. For example, rules can be applied by IP range, namespace(s), and/or pod label.

To start with have a default block all policy and only allow traffic between pods/services as per workload requirement.

The default behavior is to allow all ingress and egress traffic to and from pods in a namespace if no network policy is in place. For security reasons, you may want to consider having a default deny in place for pods in the event that no network security policy was applied. This can be accomplished by putting the below policies in place for a namespace.

  • Default Deny All Ingress and Egress Traffic.
  • The policy below extends this above when pods are expected to accept Ingress via the HTTP(S) load balancer. Ingress is only allowed from the load balancer health checking ranges. This example assumes cloud native load balancing.
  • When Network Endpoint Groups (NEGs) are not being used, e.g. for LoadBalancer type services, pods will need to allow ingress from other nodes in the network, and therefore the subnet CIDR will also need to be explicitly allowed.

Using Network Policies to Only Allow Ingress Traffic from within the Namespace.

# By using podSelector 
# Selects all pods in the secondary namespace as target pods.
# The ingress allowlist rule allows traffic from all pods in the namespace. An example is shown below.
An example network policy file

In multi-tenant clusters, it is recommended to use network policies to limit communication between namespaces.

More details on setting up network policies for your workloads can be found here: https://cloud.google.com/kubernetes-engine/docs/how-to/network-policy

Node Service Accounts

Use a custom service account with least privilege for GKE. It is recommended to create a new custom service account with the minimum set of privileges that the workloads on your cluster require, in order to access other GCP services. Note: still recommended even if using workload identity.

Restrict the permissions of a node VM by using service account permissions, not access scopes. see Migrating from legacy access scopes

GKE nodes by default use the default Compute Engine Service Account which has broad access across your project to access other services.

To spin up a GKE node pool with a custom service account: The minimum set of IAM permissions required for nodes of a GKE cluster are as follows. This would be the recommended set if all other gcp access is via workload identity service account:

  • Monitoring.viewer
  • monitoring.metricWriter
  • logging.logWriter

Reference: https://cloud.google.com/kubernetes-engine/docs/how-to/hardening-your-cluster#use_least_privilege_sa

Workload identity

Workload identity is the recommended method for workloads on GKE to access Google APIs and services.

Workload identity allows more granular access controls. A Google service account, instead of the node’s service accounts, are used to access workloads. In the case of multiple workloads on a single node, this enables more granular level permissions to be set per workload identity.

This is implemented by binding a distinct Kubernetes Service Account (KSA) per namespace and cluster with a Google Service Account that has the required permissions for the workload.

  • Workload identity eliminates the requirement to implement metadata concealment (GKE metadata concealment protects some potentially sensitive VM metadata from user workloads running on your nodes). Since the pod can no longer access sensitive VM metadata on cluster’s nodes such as Kubelet credentials.
  • With workload identity enabled, it is still recommended to configure a custom Google service account, as described in the section above, to the node’s identity. This will mitigate the risk of horizontal infiltration if the node is compromised.
  • Workload Identity can’t be used by Pods running on the host network. Requests made from these pods to metadata endpoints are routed to the Compute Engine metadata server.
  • GKE creates a fixed workload identity pool for each Google Cloud project.

Details on setting up Workload identity for your cluster workloads can be found here: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity

PodSecurity admissions

The Kubernetes project deprecated PodSecurityPolicy and removed the feature entirely in Kubernetes v1.25. See migrating from PodSecurityPolicy to the PodSecurity admissions.

Kubernetes offers a built-in Pod Security admission controller to enforce the Pod Security Standards. Pod security restrictions are applied in specific modes to specific namespaces.

# MODE must be one of `enforce`, `audit`, or `warn`.
# LEVEL must be one of `privileged`, `baseline`, or `restricted`.

Policy violations in the audit and enforce modes are recorded in the audit logs for your cluster.

PodSecurity should constrain the Pod’s capabilities to only those required for that workload, like

  • Containers must be required to run as non-root users.
  • Web-front-end Pods must have ReadOnlyRootFileSyatem.
  • Refer for more, from Pod Security Standards.

Gatekeeper

(Google recommended option) In addition to using the built-in Kubernetes PodSecurity admission controller to apply Pod Security Standards, you can also use Gatekeeper, an admission controller based on the Open Policy Agent (OPA), to create and apply custom Pod-level security controls.

Workloads should not be granted the permission to modify themselves (self-modify) in the first place. When self-modification is necessary, you can limit permissions by applying Gatekeeper or Policy Controller constraints, such as NoUpdateServiceAccount from the open source Gatekeeper library, which provides several useful security policies.

Anthos Config Management offers Policy Controller, which is a policy engine built on the Gatekeeper open source project. For GKE users, there is an additional charge to use Policy Controller and Config Controller.

Use of GKE Sandbox

GKE Sandbox provides an extra layer of security to prevent malicious code from affecting the host kernel on your cluster nodes. GKE Sandbox protects the host kernel on your nodes when containers in the Pod execute unknown or untrusted code.

GKE Sandbox uses gVisor, an open source project. gVisor is a user space re-implementation of the Linux kernel API that does not need elevated privileges. When you enable GKE Sandbox on a node pool, a sandbox is created for each Pod running on a node in that node pool.

You should consider sandboxing a workload in situations such as:

  • The workload runs untrusted code (like SaaS providers, multi-tenant)
  • You want to limit the impact if an attacker compromises a container in the workload.

Further Details

Many of the recommendations, as well as other common misconfigurations, can be automatically checked using Security Health Analytics.

Production grade GKE, my another writeup, details out 3 keys practices for hardening GKE deployments, wrt platform’s perceptive.

--

--

Pavan kumar Bijjala

Architect @Accenture | Cloud as your next Enterprise | App modernization | Product Engineering