Using GCP Firewall with GKE LoadBalancer service

Harinderjit Singh
ITNEXT
Published in
7 min readMar 29, 2023

--

Purpose

Using the GKE Loadbalancer service is not a complex topic in general. When a LoadBalancer for a GKE service gets provisioned, automatic firewall rules also get created, so why do we need this post?

The motive of this post is to share the experience related to an issue I encountered recently while working in a client environment. We are going to discuss a special case while using GCP Firewall with GKE service of type LoadBalancer.

Setup and Scenario

  • There is a scheduler server in the On-Prem network with a private IP.
  • There is a zonal GKE cluster with 6 nodes and there is a namespace to host the scheduler agent in GKE.
  • Namespace has a Kubernetes deployment of the agent with replica equals 1, service of type LoadBalancer with a private IP address.
Setup
  • On-Prem Network and GCP shared VPC is connected with interconnect connection.
  • There is a firewall rule in VPC to allow the ingress from the On-Prem server to the target Nodes with the network tag “allow-ingress” on port 8080
  • Default firewall rules for LoadBalancer which allow the ingress on all Nodes in VPC on port 8080 don’t exist due to tight security guidelines.
  • The scheduler server needs to connect to the agent deployed GKE through LoadBalancer’s private IP.

Issue

It is observed that Scheduler is reachable most of the time except once in a while agent is not reachable from the Scheduler Server. The issue is intermittent and is not reproducible at will.

sporadically works from any Machine outside the GKE cluster

Troubleshooting

  • Unfortunately, TCP passthrough LoadBalancer logs are of no help.
  • Firewall logs are not available for us to troubleshoot as it's a shared VPC and we didn’t have access to that.
  • No issue with interconnect observed.
  • The agent was always running and its logs didn't record any issues.
  • Spun up a curl pod and tested the calls to agent service using ClusterIP to make sure it was listening to all the time. Was able to replicate the issue on the on-prem server during the same duration. So it's definitely not about the agent pod availability.
  • Create another VM in shared VPC and create a Firewall rule to allow the ingress from this VM to the target Nodes with the network tag “allow-ingress” on port 8080 and the issue was reproducible on that as well. So it is not about Cloud Interconnect too.
  • Now my focus was on how TCP passthrough LoadBalancer was load balancing traffic to the pods running the agent application
  • We see that all endpoints are “healthy”
LB shows all nodes as healthy

GKE Service LoadBalancing

All nodes of the cluster pass the health check even if the node has no serving Pods. If one or more serving Pods exist on a node, that node passes the load balancer’s health check even if the serving Pods are terminating or are failing readiness probes.

  • For externalTrafficPolicy equals “Cluster”, It sends traffic to all the nodes in the instance group whether or not a serving pod exists on that node and then the node’s iptables or eBPF config (which is set by kube-proxy) takes over and redirects the packets to serving pods (even on other nodes). It is expected.
  • For the externalTrafficPolicy setting for service set to “Local”, Loadbalancer sends the requests only to the nodes of the instance group having the serving pods in a “running” state.

Only the nodes with at least one ready, non-terminating serving Pod pass the load balancer’s health check. Nodes without a serving Pod, nodes whose serving Pods all fail readiness probes, and nodes whose serving Pods are all terminating fail the load balancer’s health check.

During state transitions, a node still passes the load balancer’s health check until the load balancer’s unhealthy threshold is reached. The transition state occurs when all serving Pods on a node begin to fail readiness probes or when all serving Pods on a node are terminating. How the packet is processed in this situation depends on the GKE version.

  • The node receiving the request routes the packet to a serving Pod running on the node which received the packet from the load balancer. That node sends response packets to the original client using Direct Server Return. This is the primary intent of this type of traffic policy.
  • I tested externalTrafficPolicy equals Local for the Load balancer service.
only 2 nodes are healthy
  • If you see it is just healthy on two nodes out of six because two out of three pods are running on one node so we have only two healthy nodes even though there are 3 pods running.
  • Now the LoadBalancer Private IP is reachable from the On-prem server and from the other VM we created for the test
  • No more connection failures at all.
The agent application is reachable

The Real Reason

When I reverted back to externalTrafficPolicy equals “Cluster”, the issue resurfaced.

externalTrafficPolicy equals “Cluster”

  • externalTrafficPolicy equals “Cluster” is not the actual problem, it should not be because the load balancer and cluster nodes work together to route packets received for LoadBalancer Services. If the node that receives the packets from the load balancer lacks a ready and serving Pod, the node routes the packets to a different node that does contain a ready and serving Pod. Response packets from the Pod are routed from its node back to the node which received the request packets from the load balancer. That first node then sends the response packets to the original client using Direct Server Return.

Since its intermittently failing it could be that few nodes are not able to route the incoming packets to other nodes that have the serving pod(s). How could that happen because that’s not intended behavior?

What if the first node which routed the packets to the serving pod on another node, doesn’t have a firewall “allow” rule?

LoadBalancer still marks that node as healthy and it doesn't consider firewall rules in its health checks but the firewall doesn't let the packets be forwarded to the nodes which don't have the network tags which allow the ingress from the source VMs/Machines. That’s the main thing that confused me while troubleshooting this issue.

In this environment, we had 3 nodes in one node pool (A) where the agent application pods are deployed and 3 nodes in another node pool (B) dedicated to another application. Node pool B has toleration just for the other application pods. The “allow-ingress” network tag was applied just to Node pool A used by the agent application and not to Node Pool B.

But Load Balancer doesn't work with respect to node pools, it forwards the traffic to Instance groups on the service port. Instance groups have all the GKE cluster nodes within the same network zone. So the LoadBalancer will route traffic to the nodes in node pool B as well where the firewall doesn't allow the ingress, Hence the problem.

NOTE: The GKE service LoadBalancer doesn't forward the traffic on node ports but on the port defined in the service. The way packet processing happens is presented in brief in the GCP documentation.

externalTrafficPolicy equals “Local”

  • How does setting externalTrafficPolicy to Local help? Only the nodes with at least one ready, non-terminating serving Pod pass the load balancer’s health check. Nodes without a serving Pod, nodes whose serving Pods all fail readiness probes, and nodes whose serving Pods are all terminating fail the load balancer’s health check and thus traffic is not routed to those nodes without healthy pods.
  • Those nodes where pods are running already had the network tag “allow-ingress” and that's why it worked.
  • This works but does result in another situation where LoadBalancer is not distributing the load evenly. Nodes with two or more pods for the application are getting the same number of requests as compared to the node having only one serving pod.

In Nutshell

  • When using custom firewall rules with GKE if you wish to use externalTrafficPolicy equals “Cluster” in Lod Balancer service definitions for better Load Distribution, concerned ingress rule network tag(s) must be applied to all node pools.
  • When using custom firewall rules with GKE if you wish to apply an ingress rule network tag(s) only to certain node pools where the application is deployed, make sure to use externalTrafficPolicy equals “Local” in the Load balancer service associated with the application.
  • The GKE TCP passthrough load balancer doesn't use the NodePorts to route traffic to nodes but rather the “port” specified in the service manifest.

If you want to learn about Kubernetes services in-depth, consider going through my post on Services.

Please read my other articles as well and share your feedback. If you like the content shared please like, comment, and subscribe for new articles.

--

--

Technical Solutions Developer (GCP). Writes about significant learnings and experiences at work.