Kubernetes LoadBalance service using Cilium BGP control plane

18 min readJul 27, 2023

The Kubernetes container orchestration platform offers plugin support for Load Balancers, making it possible to create highly available services with even traffic distribution across a set of containers. One of the challenges facing those looking to experiment with Kubernetes is the lack of a “builtin” Load Balancer.

At a minimum, a Kubernetes cluster requires a container runtime interface [ CRI] plugin (like Containerd or CRI-O) and a container networking interface [ CNI] plugin (like Calico, Flannel, Weave or Cilium). While installing CRI and CNI tools is generally straightforward, adding support for external load balancer services to a cluster has traditionally been a complex task.

The good news is that Cilium, a popular CNI plugin, has recently added LoadBalancer IP Address Management (LB IPAM) for Kubernetes. This combined with Ciliums’s blazing fast XDP packet processing makes Cilium a great choice for building out a fully featured Kubernetes cluster.

In this blog, we will configure Cilium to supply Load Balancer service support in Kubernetes. Our solution will allow our users to create external load balancer services for pods running in our cluster (North -> South). We’ll configure our solution on a set of Raspberry Pis running K3s, a certified Kubernetes distribution designed for unattended / resource-constrained systems.

Picture source: https://cilium.io

0. The hardware

We will use four RaspberryPi 4s for our cluster. One of the Pis will act as the Control Plane Node, and the remaining three nodes will serve as Worker Nodes. As you can see, this particular RaspberryPi rack has a colorful disposition.

Rack: https://www.uctronics.com/cluster-and-rack-mount/uctronics-upgraded-complete-enclosure-raspberry-pi-cluster.html

1. Network topology

1.1 Network Description

Our network topology is straightforward with a single host subnet, 192.168.1.0/24, for all of the nodes.

2. Kubernetes setup

We will start with a minimal k3s installation. K3s is designed to be a complete solution and installs several networking components by default, including:

Flannel CNI plugin
Traefik Ingress Controller
Metallb Load Balancer
Kubernetes Metrics Servers
Kube-proxy in cluster service proxy

For our setup, we will disable all of these addons. Many of them will be replaced with more integrated (and often faster/more efficient Cilium implementations), and others we will install ourselves.

2.1 Initial check

Let’s check the nodes (n.b. we use ‘k’ as an alias for ‘kubectl’)

➜ k3s-rpi (main) ✗ k get nodes
NAME                   STATUS      ROLES                  AGE     VERSION
k3s-worker-node-03     NotReady    <none>                 3d18h   v1.27.3+k3s1
k3s-worker-node-02     NotReady    <none>                 3d18h   v1.27.3+k3s1
k3s-worker-node-01     NotReady    <none>                 3d18h   v1.27.3+k3s1
k3s-control-plane-01   NotReady    control-plane,master   3d18h   v1.27.3+k3s1

You can see that all nodes are in status NotReady. This is because we have yet to install a CNI plugin and normal pods can not run without a pod network. Let’s check the pods. To start clean, we delete every single pod, even those from the kube-system.

✗ k get pods -A
NAMESPACE            NAME                                                    READY   STATUS    RESTARTS   AGE
kube-system          coredns-5d78c9869d-p5svm                                0/1     Pending   0          43s
local-path-storage   local-path-provisioner-6bc4bddd6b-xpn8h                 0/1     Pending   0          84s

In a K3s cluster we don’t see the kube-apiserver, scheduler, datastore or controller-manager Pods because everything is packaged together in one binary where they all run together.

From the output, we can see some pods are Pending. Without a CNI plugin, we don’t have a Pod network. Let’s install Cilium to get our network running!

3. Cilium installation

While there are several ways you can install Cilium, we will use the cilium cli. Helm is another popular option. Here is the official documentation for Cilium cli installation

The example here uses a Mac as the administration and setup machine. You can, of course, use Linux and other systems as well, for more information on Cilium setup from other systems, refer to the official documentation.

CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/master/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "arm64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-darwin-${CLI_ARCH}.tar.gz{,.sha256sum}
shasum -a 256 -c cilium-darwin-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-darwin-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-darwin-${CLI_ARCH}.tar.gz{,.sha256sum}

We can use the Cilium version command to verify the CLI is properly installed:

✗ cilium version
cilium-cli: v0.15.4 compiled with go1.20.4 on darwin/arm64
cilium image (default): v1.13.4
cilium image (stable): v1.13.4
cilium image (running): unknown. Unable to obtain cilium version, no cilium pods found in namespace "kube-system"

Let’s now install the cilium CNI:

✗ cilium install
🔮 Auto-detected Kubernetes kind: k3s
ℹ️  Using Cilium version 1.13.4
🔮 Auto-detected cluster name: k3s-rpi
ℹ️  kube-proxy-replacement disabled
🔮 Auto-detected datapath mode: tunnel
🔮 Auto-detected kube-proxy has not been installed
ℹ️  Cilium will fully replace all functionalities of kube-proxy

Let’s recheck the pods. As you can see, there are new players in the game with the cilium-prefix.

✗ k get pods -A
NAMESPACE            NAME                                                    READY   STATUS     RESTARTS   AGE
kube-system          cilium-6fnjv                                            0/1     Init:1/6   0          20s
kube-system          cilium-g2s9w                                            0/1     Init:4/6   0          20s
kube-system          cilium-hgftv                                            0/1     Init:1/6   0          20s
kube-system          cilium-operator-768959858c-zjjnc                        1/1     Running    0          20s
kube-system          cilium-p82qr                                            0/1     Init:3/6   0          20s
kube-system          coredns-5d78c9869d-p5svm                                0/1     Pending    0          5m13s
local-path-storage   local-path-provisioner-6bc4bddd6b-q8f2x                 0/1     Pending    0          5m13s

3.1 Cilium components

Here are the two components we see in our cluster. We put a small explanation about them.

cilium-operator: The Cilium Operator is responsible for managing duties in the cluster which should logically be handled once for the entire cluster rather than once for each node in the cluster
Cilium pods: The Cilium agent (cilium-agent) runs on each Linux container host. At a high level, the agent accepts configuration that describes service-level network security and visibility policies. It then listens to events in the container runtime to learn when containers are started or stopped, and it creates custom BPF programs, which the Linux kernel uses to control all network access in/out of those containers.

You can see that there is no agent in the name of the pod, but if you check the pod logs, you can see that the pod cilium-xxxxx is a cilium-agent.

✗ k logs -n kube-system cilium-6fnjv | head -1
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)

Give it some time to settle down and check the pods again.

✗ k get pods -A
NAMESPACE            NAME                                                    READY   STATUS    RESTARTS   AGE
kube-system          cilium-6fnjv                                            1/1     Running   0          6m3s
kube-system          cilium-g2s9w                                            1/1     Running   0          6m3s
kube-system          cilium-hgftv                                            1/1     Running   0          6m3s
kube-system          cilium-operator-768959858c-zjjnc                        1/1     Running   0          6m3s
kube-system          cilium-p82qr                                            1/1     Running   0          6m3s
kube-system          coredns-5d78c9869d-p5svm                                1/1     Running   0          10m
local-path-storage   local-path-provisioner-6bc4bddd6b-q8f2x                 1/1     Running   0          10m

Everything is in a Running state, our cluster is working like a charm.

3.2 Cilium BGP

Picture soruce: https://cilium.io

Let’s switch gears and move on to BGP (Border Gateway Protocol).

What is a BGP ?: BGP is an internet routing protocol that enables the exchange of routing information between autonomous systems (ASes), allowing networks to learn and advertise routes to reach different destinations over public and private networks.

For more information on BGP, take a look at RFC 4271 — BGP.

3.3 Enabling BGP

From the official Cilium BGP control plane documentation, you will see that currently, a single flag in the Cilium agent exists to turn on the BGP Control Plane feature set. There are different ways to enable this flag, however we will continue using the cilium cli (Helm requires a different approach, so check the official documentation if you are using Helm).

Before we change the BGP flag, let’s check the current configuration.

> n.b. You can enable BGP when you install Cilium, but we want to show you each of the underlying steps.

✗ cilium config view | grep -i bgp
enable-bgp-control-plane                       false

As you can see, the BGP Control Plane feature is disabled by default. Let’s enable it!

✗ cilium config set enable-bgp-control-plane true
✨ Patching ConfigMap cilium-config with enable-bgp-control-plane=true...
♻️  Restarted Cilium pods

Let’s check the config to verify:

✗ cilium config view | grep -i bgp
enable-bgp-control-plane

Now it looks better. Let’s check on our pods.

✗ k get pods -n kube-system
NAME                                                    READY   STATUS    RESTARTS   AGE
cilium-5mczq                                            0/1     Running   0          9s
cilium-k9p6z                                            0/1     Running   0          9s
cilium-operator-768959858c-zjjnc                        1/1     Running   0          21m
cilium-zg5dw                                            0/1     Running   0          9s
cilium-zlg96                                            0/1     Running   0          9s
coredns-5d78c9869d-p5svm                                1/1     Running   0          26m
local-path-provisioner-6bc4bddd6b-q8f2x                 1/1     Running   0          10m

3.4 Restarting the Operator

The READY state for our Cilium Agents is 0/1 , which means there’s a problem. Let’s read the logs to see why the Cilium Agents are no longer READY.

✗ k logs -n kube-system cilium-5mczq | tail -1
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/resource/resource.go:183: Failed to watch *v2alpha1.CiliumBGPPeeringPolicy: failed to list *v2alpha1.CiliumBGPPeeringPolicy: the server could not find the requested resource (get ciliumbgppeeringpolicies.cilium.io)" subsys=k8s

Hmmm Failed to watch *v2alpha1.CiliumBGPPeeringPolicy. Setting enable-bgp-control-plane true causes the Cilium Agents to look for the Cilium BGP Peering Policy, which does not yet exist.

Kubernetes is highly extensible, allowing tools to create their own resource types. Cilium uses Kubernetes CRDs (Custom Resource Definitions) to define most of its configuration objects. The Cilium Operator did not create the CiliumBGPPeeringPolicy CRD because we were not using that feature at installation time. Let's check the resource types defined in our cluster with the api-resources command:

✗ k api-resources | grep -i cilium
ciliumclusterwidenetworkpolicies   ccnp                                cilium.io/v2                           false        CiliumClusterwideNetworkPolicy
ciliumendpoints                    cep,ciliumep                        cilium.io/v2                           true         CiliumEndpoint
ciliumexternalworkloads            cew                                 cilium.io/v2                           false        CiliumExternalWorkload
ciliumidentities                   ciliumid                            cilium.io/v2                           false        CiliumIdentity
ciliumloadbalancerippools          ippools,ippool,lbippool,lbippools   cilium.io/v2alpha1                     false        CiliumLoadBalancerIPPool
ciliumnetworkpolicies              cnp,ciliumnp                        cilium.io/v2                           true         CiliumNetworkPolicy
ciliumnodeconfigs                                                      cilium.io/v2alpha1                     true         CiliumNodeConfig
ciliumnodes                        cn,ciliumn                          cilium.io/v2                           false        CiliumNode

No CiliumBGPPeeringPolicy!
Looking closely at the pods, we can see that the Cilium Agents were redeployed, but the Cilium Operator was not. In the example above, the Operator has been running for 21 minutes, but the Agents have been running for only a few seconds. Our cilium-operator was not redeployed when we updated the configuration and thus has not taken any action to support BGP. So we need to refresh the Operator.

✗ k get pods -n kube-system
NAME                                                    READY   STATUS    RESTARTS   AGE
cilium-5mczq                                            0/1     Running   0          9s
cilium-k9p6z                                            0/1     Running   0          9s
cilium-operator-768959858c-zjjnc                        1/1     Running   0          21m

Let’s redeploy our cilium-operator by deleting it, this will cause the new instance to read the updated configuration on startup.

✗ k delete pod -n kube-system cilium-operator-768959858c-zjjnc

Now let’s take a look at the Operator logs:

✗ k logs -n kube-system cilium-operator-768959858c-zk7zq | grep CRD
level=info msg="Creating CRD (CustomResourceDefinition)..." name=CiliumBGPPeeringPolicy/v2alpha1 subsys=k8s
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=CiliumExternalWorkload/v2 subsys=k8s
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=CiliumNodeConfig/v2alpha1 subsys=k8s
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=CiliumLoadBalancerIPPool/v2alpha1 subsys=k8s
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=CiliumEndpoint/v2 subsys=k8s
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=CiliumIdentity/v2 subsys=k8s
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=CiliumNode/v2 subsys=k8s
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=CiliumClusterwideNetworkPolicy/v2 subsys=k8s
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=CiliumNetworkPolicy/v2 subsys=k8s
level=info msg="CRD (CustomResourceDefinition) is installed and up-to-date" name=CiliumBGPPeeringPolicy/v2alpha1 subsys=k8s
level=info msg="Starting CRD identity garbage collector" interval=15m0s subsys=cilium-operator-generic

As you can see, the operator has created the needed CRD: Creating CRD (CustomResourceDefinition)..." name=CiliumBGPPeeringPolicy/v2alpha1

Redisplay the api-resources to see the new CRD:

✗ k api-resources | grep -i ciliumBGP
ciliumbgppeeringpolicies    bgpp    cilium.io/v2alpha1                     false        CiliumBGPPeeringPolicy

3.5 Cilium BGP Peering Policy

Now that we have a CiliumBGPPeeringPolicy type (CRD) we can create an object of that type to define our Cilium BGP peering policy.

Here is the yaml file which we will use to create it.

✗ cat cilium-bgp-policy.yaml

apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
 name: 01-bgp-peering-policy
spec:
 nodeSelector:
   matchLabels:
     bgp-policy: a
 virtualRouters:
 - localASN: 64512
   exportPodCIDR: true
   neighbors:
    - peerAddress: '192.168.1.1/32'
      peerASN: 64512
   serviceSelector:
     matchExpressions:
       - {key: somekey, operator: NotIn, values: ['never-used-value']}"cilium.io/v2alpha1" kind: CiliumBGPPeeringPolicy metadata: name: 01-bgp-peering-policy spec: nodeSelector: matchLabels: bgp-policy: a virtualRouters: - localASN: 64512 exportPodCIDR: true neighbors: - peerAddress: '192.168.1.1/32' peerASN: 64512 serviceSelector: matchExpressions: - {key: somekey, operator: NotIn, values: ['never-used-value']}

Let’s break it down and discuss the options section by section. The first part of our specification (`spec:`) is the nodeSelector. This defines which nodes our policy applies to. The label bgp-policy=a is defined here, so we will have to add this label to all of our cluster nodes before the policy will be applied to them.

 nodeSelector:
   matchLabels:
     bgp-policy: a

Next, we define our virtual routers. Virtual routers allow multiple distinct routers to be supported within a single routed environment, allowing clusters to configure multiple, separate logical routers within a single network of nodes.

The Cilium cluster uses a local AS number (ASN) to identify the AS in which the BGP service resides. For our purposes, we’ll use an ASN in the well-known private ASN range (64512–65535). We set our ASN to 64512:

 virtualRouters:
 - localASN: 64512

Next, we ask Cilium to advertise the Pod Network to peers to allow external traffic to be routed directly to our pods (you can disable this feature if not desired). Documentation can be found here:

exportPodCIDR: true

Next, we set the BGP peer we will communicate with. This is typically the upstream router. In our case, this is our Mikrotik router, on which we will configure BGP later. In the example, we specify the same ASN as our nodes and the IP of our router: 192.168.1.1/32.

   neighbors:
    - peerAddress: '192.168.1.1/32'
      peerASN: 64512

The final part of our policy uses the serviceSelector key to define which services we will expose. The serviceSelector allows you to configure which Kubernetes Load Balancer Services are advertised (announced) outside of the Cluster (documentation can be found here: ). This is the relevant section:

Service announcements

By default, virtual routers will not announce services. Virtual routers will advertise the ingress IPs of any LoadBalancer services that match the .serviceSelector of the virtual router. If you wish to announce ALL services within the cluster, a NotIn match expression with a dummy key and value can be used like:*
node selector: Nodes which are selected by this label selector will apply the given policy

We want all of our Load Balancer Services to be available externally so we will create a dummy selector to select all services. You can also use the selector setting to select specific services by label or limit the functionality to specific namespaces.

serviceSelector:
     matchExpressions:
       - {key: somekey, operator: NotIn, values: ['never-used-value']}

Now that we have an understanding of the policy, let’s apply it to the cluster:

✗ k apply -f cilium-bgp-policy.yaml
ciliumbgppeeringpolicy.cilium.io/01-bgp-peering-policy created

4. Kubernetes nodes label

We need to label the nodes that we want the BGP policy to apply to. In our case, we will label all worker nodes, leaving out the control-plane node. Our CiliumBGPPeeringPolicy node selector expects the bgp-policy=a label.

✗ k label nodes k3s-worker-node-01 bgp-policy=a
✗ k label nodes k3s-worker-node-02 bgp-policy=a
✗ k label nodes k3s-worker-node-03 bgp-policy=a

Select all nodes with the bgp-policy=a label to make sure the label is applied properly:

✗ k3s-rpi (main) ✗ k get nodes -l bgp-policy=a
NAME                 STATUS   ROLES    AGE     VERSION
k3s-worker-node-03   Ready    <none>   3d19h   v1.27.3+k3s1
k3s-worker-node-02   Ready    <none>   3d19h   v1.27.3+k3s1
k3s-worker-node-01   Ready    <none>   3d19h   v1.27.3+k3s1

5. LB IPAM

When you create a Load Balancer Service in a Kubernetes cluster, the cluster itself does not actually assign the Service a Load Balancer IP (aka External IP), we need a plugin to do that. If you create a Load Balancer Service without a Load Balancer plugin the External IP address will show Pending indefinitely.

The Cilium LoadBalancer IP Address Management (LB IPAM) feature can be used to provision IP addresses for our Load Balancer Services.

Here is what the official doc says about it:

LB IPAM is a feature that allows Cilium to assign IP addresses to Services of type LoadBalancer. This functionality is usually left up to a cloud provider, however, when deploying in a private cloud environment, these facilities are not always available.
This section must understand that LB IPAM is always enabled but dormant. The controller is awoken when the first IP Pool is added to the cluster.

Let’s create our cilium LoadBalancer IP pool. To create a pool we name it and give a CIDR range. We’ll use 172.198.1.0/24 as our CIDR range, it is important that this range does not overlap with other networks in use with your cluster.

✗ cat cilium-ippool.yaml

apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: "lb-pool"
spec:
  cidrs:
  - cidr: "172.198.1.0/24"apirsion: "cilium.io/v2alpha1" kind: CiliumLoadBalancerIPPool metadata: name: "lb-pool" spec: cidrs: - cidr: "172.198.1.0/24"

Let’s create it.

✗ k create -f cilium-ippool.yaml
ciliumloadbalancerippool.cilium.io/lb-pool created

The LoadBalancer IP Address Management (LB IPAM) documentation provides several additional examples.

IP Pools are not allowed to have overlapping CIDRs. When an administrator creates pools that overlap, a soft error is caused. The last added pool will be marked as Conflicting, and no further allocation will happen from that pool.

That is all you need to do to enable Load Balancer IPAM in Kubernetes.

6. Cilium BGP peers

The cilium cli provides a number of useful commands for checking BGP status. Use the “ peers “ command to display BGP Peer information:

✗ cilium bgp peers
Node                 Local AS   Peer AS   Peer Address   Session State   Uptime     Family         Received   Advertised
k3s-worker-node-01   64512      64512     192.168.1.1    active                     ipv4/unicast   0          0
                                                                                    ipv6/unicast   0          0
k3s-worker-node-02   64512      64512     192.168.1.1    active                     ipv4/unicast   0          0
                                                                                    ipv6/unicast   0          0
k3s-worker-node-03   64512      64512     192.168.1.1    active                     ipv4/unicast   0          0

From the output, we can see that our peers are configured on the k3s side and that our Session State is “active”. This means that our nodes are configured but that they have not been able to establish a connection with a peer. This is expected because we have not configured the upstream router to peer with the nodes yet.

Let’s configure our router to complete the installation.

8. Router BGP ( the peer )

Now we need to configure our north/south router to create a BGP session between it and the Kubernetes worker nodes. This solution uses a Mikrotik router, here is the official Mikrotik BGP documentation.

Let’s ssh to the router and configure it. You can do that from the GUI but we will use the CLI.

✗ ssh 192.168.1.1
  MMM      MMM       KKK                          TTTTTTTTTTT      KKK
  MMMM    MMMM       KKK                          TTTTTTTTTTT      KKK
  MMM MMMM MMM  III  KKK  KKK  RRRRRR     OOOOOO      TTT     III  KKK  KKK
  MMM  MM  MMM  III  KKKKK     RRR  RRR  OOO  OOO     TTT     III  KKKKK
  MMM      MMM  III  KKK KKK   RRRRRR    OOO  OOO     TTT     III  KKK KKK
  MMM      MMM  III  KKK  KKK  RRR  RRR   OOOOOO      TTT     III  KKK  KKK

  MikroTik RouterOS 7.8 (c) 1999-2023       https://www.mikrotik.com/


Press F1 for help
[vhristev@MikroTik]
[vhristev@MikroTik] > /routing/bgp/connection/
[vhristev@MikroTik] /routing/bgp/connection> add address-families=ip as=64512 disabled=no local.role=ibgp name=PEER_TO_K3S_WN_1 output.default-originate=always remote.address=192.168.1.201 routing-table=main
[vhristev@MikroTik] /routing/bgp/connection> add address-families=ip as=64512 disabled=no local.role=ibgp name=PEER_TO_K3S_WN_2 output.default-originate=always remote.address=192.168.1.202 routing-table=main
[vhristev@MikroTik] /routing/bgp/connection> add address-families=ip as=64512 disabled=no local.role=ibgp name=PEER_TO_K3S_WN_3 output.default-originate=always remote.address=192.168.1.203 routing-table=main

In the session above we connect to the router, navigate to /routing/bgp/connection/ a special Mikrotik CLI configuration directory, and from there, we create three BGP connections. Here are the details:

address-families=ip: We want just ipv4
as=64512: Our BGP ASN
disabled=no: Connection is active (NOT disabled)
local.role=ibgp: We use ibgp because it’s a local (internal). For external ASNs we can use ebgp
name=PEER_TO_K3S_WN_1: Name of the connection, in our case k3s worker node 01, 02 and 03 respectively
output.default-originate=always: We want to advertise the default route
remote.address=192.168.1.201: IP address of the peer a.k.a. our k3s worker node
routing-table=main: Use the main routing table

We are all done with the router. Now let’s check the BGP peers on the Cilium side.

8. BGP Verification

8.1 Verify BGP from the Cilium side

✗ cilium bgp peers
Node                 Local AS   Peer AS   Peer Address   Session State   Uptime     Family         Received   Advertised
k3s-worker-node-01   64512      64512     192.168.1.1    established     1h33m42s   ipv4/unicast   1          2
                                                                                    ipv6/unicast   0          0
k3s-worker-node-02   64512      64512     192.168.1.1    established     1h33m37s   ipv4/unicast   1          2
                                                                                    ipv6/unicast   0          0
k3s-worker-node-03   64512      64512     192.168.1.1    established     1h33m47s   ipv4/unicast   1          2

8.2 Verify BGP from the Mikrotik side

Now we can see Cilium has sessions established, and we have Received and Advertised routes. Let’s check the router:

[vhristev@MikroTik] /routing/bgp/connection> ..
[vhristev@MikroTik] /routing/bgp> session/print
Flags: E - established
 0 E name="PEER_TO_K3S_WN_2-1"
     remote.address=192.168.1.202 .as=64512 .id=192.168.1.202 .capabilities=mp,rr,enhe,as4,fqdn .afi=ip,ipv6 .hold-time=1m30s .messages=1126 .bytes=21428 .eor=""
     local.role=ibgp .address=192.168.1.1 .as=64512 .id=192.168.1.1 .capabilities=mp,rr,gr,as4 .messages=1127 .bytes=21440 .eor=""
     output.procid=21 .default-originate=always
     input.procid=21 .last-notification=ffffffffffffffffffffffffffffffff0015030603 ibgp
     multihop=yes hold-time=1m30s keepalive-time=30s uptime=1h33m44s560ms last-started=jul/26/2023 15:00:56 last-stopped=jul/26/2023 14:50:58

 1 E name="PEER_TO_K3S_WN_1-1"
     remote.address=192.168.1.201 .as=64512 .id=192.168.1.201 .capabilities=mp,rr,enhe,as4,fqdn .afi=ip,ipv6 .hold-time=1m30s .messages=1128 .bytes=21466 .eor=""
     local.role=ibgp .address=192.168.1.1 .as=64512 .id=192.168.1.1 .capabilities=mp,rr,gr,as4 .messages=1129 .bytes=21478 .eor=""
     output.procid=20 .default-originate=always
     input.procid=20 ibgp
     multihop=yes hold-time=1m30s keepalive-time=30s uptime=1h33m49s620ms last-started=jul/26/2023 14:59:51

 2 E name="PEER_TO_K3S_WN_3-1"
     remote.address=192.168.1.203 .as=64512 .id=192.168.1.203 .capabilities=mp,rr,enhe,as4,fqdn .afi=ip,ipv6 .hold-time=1m30s .messages=192 .bytes=3682 .eor=""
     local.role=ibgp .address=192.168.1.1 .as=64512 .id=192.168.1.1 .capabilities=mp,rr,gr,as4 .messages=193 .bytes=3694 .eor=""
     output.procid=22 .default-originate=always
     input.procid=22 ibgp
     multihop=yes hold-time=1m30s keepalive-time=30s uptime=1h33m54s650ms last-started=jul/26/2023 14:59:21

8.3 Mikrotik GUI

We can also inspect the status from the Mikrotik GUI. The Connections tab shows the configured connections:

The Sessions tab shows the active Mikrotik BGP sessions with the Kubernetes nodes, complete with uptime.

Beautiful, we have three established sessions. Now let’s check the routes that are being advertised.

[vhristev@MikroTik] /routing/bgp/advertisements> print
 0 peer=PEER_TO_K3S_WN_3-1 dst=0.0.0.0/0 local-pref=100 nexthop=78.130.232.1 origin=0

 0 peer=PEER_TO_K3S_WN_2-1 dst=0.0.0.0/0 local-pref=100 nexthop=78.130.232.1 origin=0

 0 peer=PEER_TO_K3S_WN_1-1 dst=0.0.0.0/0 local-pref=100 nexthop=78.130.232.1 origin=0

Finally, take a look at the Mikrotik routing table:

[vhristev@MikroTik] /routing/bgp> /routing/route/print
Flags: X, F, A - ACTIVE; c, s, b, d, a - SLAAC; H - HW-OFFLOADED
Columns: DST-ADDRESS, GATEWAY, AFI, DISTANCE, SCOPE, TARGET-SCOPE, IMMEDIATE-GW
    DST-ADDRESS       GATEWAY        AFI   DISTANCE  SCOPE  TARGET-SCOPE  IMMEDIATE-GW
Xs  192.168.1.202/32  bridge                      1     30            10
Xs  192.168.1.201/32  bridge                      1     30            10
Xs  192.168.1.203/32  bridge                      1     30            10
Ad  0.0.0.0/0         78.130.232.1   ip4          1     30            10  78.130.232.1%ether5
Ab  10.136.1.0/24     192.168.1.201  ip4        200     40            30  192.168.1.201%bridge
Ab  10.136.2.0/24     192.168.1.203  ip4        200     40            30  192.168.1.203%bridge
Ab  10.136.3.0/24     192.168.1.202  ip4        200     40            30  192.168.1.202%bridge
Ac  78.130.232.0/24   ether5         ip4          0     10                ether5
 b  172.198.1.193/32  192.168.1.203  ip4        200     40            30  192.168.1.203%bridge
 b  172.198.1.193/32  192.168.1.202  ip4        200     40            30  192.168.1.202%bridge
Ab  172.198.1.193/32  192.168.1.201  ip4        200     40            30  192.168.1.201%bridge

The last three lines, show an IP 172.198.1.193/32 from our LoadBalancer External ip pool 172.198.1.0/24.

9. Cilium service LoadBalancer

So far, so good. Now let’s create a pod with a service type LoadBalancer and test it.

We are going to make a simple nginx pod and a simple service exposing port 8080, with type LoadBalancer. This should cause Cilium to provision an external IP for our logical load balancer and then advertise the route through BGP.

✗ cat pod.yaml service.yaml

# pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: simple-pod
  labels:
    app: simple-pod
spec:
  containers:
  - name: my-app-container
    image: nginx:latest
    ports:
    - containerPort: 80

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: simple-pod  # Make sure this matches the label of the Pod
  ports:
  - protocol: TCP
    port: 8080
    targetPort: 80
  type: LoadBalancer

Let’s create it:

✗ k apply -f pod.yaml
pod/simple-pod created
✗ k apply -f service.yaml
service/my-service created

Let’s see what we have. From the output below, we know that we have a running pod with the name simple-pod and service with the name my-service, but the most crucial part is that we have a service TYPE LoadBalancer with EXTERNAL-IP from our ip-pool, which we created earlier, and we get 172.198.1.246 (in the example, your IP may vary).

➜ k3s-rpi (main) ✗ k get pod,svc
NAME             READY   STATUS    RESTARTS   AGE
pod/simple-pod   1/1     Running   0          41s

NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)          AGE
service/kubernetes   ClusterIP      172.20.0.1      <none>          443/TCP          3d21h
service/my-service   LoadBalancer   172.20.32.184   172.198.1.246   8080:30232/TCP   27s

You can see the Load Balancer IP Pool CIDR by displaying the CiliumLoadBalancerIPPool object:

✗ kubectl get CiliumLoadBalancerIPPool lb-pool -o jsonpath='{.spec.cidrs[0].cidr}'
172.198.1.0/24%

10. Validate LoadBalancer External IP

Now, from an external machine, my laptop, which is North of the router, we can reach.

✗ curl  172.198.1.246:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

11. Summary

In this post we walked through the process of creating Cilium based support for Load Balancer Services in a minimal K3s Kubernetes cluster. Hopefully this step by step approach has given you a better understanding of the network level operations involved in Kubernetes Load Balancers and a spring board with which to start your own experiments. Thanks for reading!

Originally published at https://rx-m.com on July 27, 2023.