Unleashing the Power of k3s for Edge Computing: Deploying 3000+ in-store Kubernetes Clusters — Part 1

Ryan Gough
JYSK Tech
Published in
8 min readSep 13, 2023

--

· The Design
· The Cattle Approach: Simplifying k3s Installation and Recovery
· The Installation
· 1. Setup our environment
· 2. Download Cilium, FluxCD
· 3. Custom Configuration
· 4. K3s Installation
· 5. Cilium Installation
· 6. FluxCD Installation
· Putting it all together
· Uninstalling our environment
· What’s next?

Design and Installation Proof of Concept

The initial instalment of our three-part series delves into how JYSK employs Kubernetes across more than 3000 stores, harnessing the power of edge computing to bolster our infrastructure’s competitive advantage.

In today’s rapidly evolving tech landscape, edge computing has emerged as a game-changer, allowing data processing closer to its source — be it a device, a store sensor, or a point-of-sale system. This decentralization boosts performance, reduces latency, and paves the way for real-time insights, especially crucial for the retail industry. But how do we efficiently manage and deploy applications at such a vast scale, especially when we’re talking about 3000+ locations?

Enter k3s: a lightweight, fast, and simplified Kubernetes distribution tailored for edge and resource-constrained environments. In this blog post, we will delve into our journey of leveraging k3s to deploy Kubernetes clusters across thousands of stores, discussing the challenges, the triumphs, and the future of edge computing in retail. Join us as we explore the intricacies and advantages of this groundbreaking approach.

The Design

When diving into the technical intricacies of deploying Kubernetes clusters, especially on such an expansive scale, simplicity becomes paramount. k3s, with its inherent lightweight nature, was a perfect fit.

K3s eliminates the need for extraneous features and external dependencies. With just a single binary for both server and agent, the installation process is straightforward. We’re not juggling with a multitude of configurations or elaborate setups; instead, we can get a cluster up and running in minutes. This simplicity not only accelerates deployment timelines but also ensures consistency and reduces potential points of failure.

Modern application deployment, particularly at scale, demands not just efficiency but also a high degree of precision. In our quest to ensure robust application deployments across our stores, we’ve adopted the GitOps approach. GitOps, with its declarative, version-controlled, and automated methodology, guarantees that we can roll out applications seamlessly, with the exact configuration and version we desire.

However, our application ecosystem isn’t a monolith. Different applications have varying network requirements. Some need to communicate with specific networks that are tagged at the hypervisor level. These unique networking demands could complicate deployments, but this is where Cilium steps in to streamline the process.

Cilium, an advanced networking and security CNI, enables fine-grained network visibility and control without modifying application code. Using Cilium, we can assign specific network interfaces to pods based on their requirements. This ensures that our applications not only deploy successfully but also communicate effectively with the necessary resources.

Our basic requirements are:

  • Straightforward deployment
  • Minimal use of resources
  • Utilisation of familiar frameworks
  • Different interfaces tailored to specific deployments
  • Quick recovery with minimal upkeep
  • Compliance with stringent network guidelines, like corporate proxies and isolated access.

The Cattle Approach: Simplifying k3s Installation and Recovery

In the realm of infrastructure management, there’s an adage that one should treat servers as “cattle, not pets.” This philosophy underscores our approach to k3s installation. While the Kubernetes ecosystem offers a plethora of deployment tools like Ansible, Terraform, and more, it’s essential that our installation method remains straightforward and reproducible. Our focus isn’t just on the initial deployment but also on ensuring a hassle-free recovery process.

By treating each k3s cluster as ‘cattle’, we ensure that our systems are both disposable and replaceable. If an issue arises with a particular cluster, we don’t invest hours attempting to diagnose and remedy it. Instead, our process is straightforward: uninstall, re-initialize, and provision. Within minutes, the store location can have its cluster back online and functioning optimally.

This approach not only reduces downtime but also alleviates the pressure on our technical teams. No longer bogged down by the tedious process of troubleshooting individual cluster hiccups, they can focus on more strategic, value-driven tasks. The end result? An infrastructure that’s resilient, scalable, and primed for the demands of modern retail. With k3s at its core, and a cattle approach to management, we ensure consistency and reliability across all our store locations, day in and day out.

The Installation

The installation process is pretty simple, in the form of a bash script (yes, good old bash). We can easily startup a cluster.

The only other customisation for the environment is the installation of our custom CA’s as we ssl-intercept on our corporate proxy.

1. Setup our environment

This was done on a minimal ubuntu installation. Although k3s doesn’t really say we need to disable swap — it’s standard practice for a full Kubernetes installation.

jysk@jysk:~$ sudo sed -ri '/\sswap\s/s/^#?/#/' /etc/fstab
jysk@jysk:~$ sudo swapoff -a
jysk@jysk:~$ free -h
total used free shared buff/cache available
Mem: 3.8Gi 300Mi 2.9Gi 32Mi 649Mi 3.2Gi
Swap: 0B 0B 0B
jysk@jysk:~$

In JYSK we have a corporate proxy which protects our systems from direct access to the net. To make this work with K3s — and other deployments we need to specify this in our session. K3s automatically takes these settings and applies them to the installation.

export HTTP_PROXY="http://<proxy>:8080"
export HTTPS_PROXY="http://<proxy>:8080"
export NO_PROXY="jysk.com,jysk.local,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,localhost"
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
export GITHUB_TOKEN=<token>

2. Download Cilium, FluxCD

Download Cilium, straight from the horses mouth — see their quickstart for more info on this. The benefit of fetching directly is we can fetch the latest stable version. We also fetch FluxCD as we will be using this as part of our GitOps deployment process, a quickstart guide can also be found for FluxCD

# Download Cilium
echo "Downloading Cilium..."
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable-v0.14.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
    # Download flux
echo "Downloading FluxCD.."
curl -s https://fluxcd.io/install.sh | bash -

3. Custom Configuration

The custom configuration is probably JYSK specific, but we deploy out own internal registry where we also proxy out to external registries, among the obvious bandwidth savings we can also scan these public images as we pull them down.

K3s looks at /etc/rancher/k3s/registries.yaml for these settings, and after some trial and error we managed to proxy the most common registries that k3s, cilium and flux uses. Add more if you need to!

# Configure mirrors
echo "Configuring mirrors (JYSK CR)"
mkdir -p /etc/rancher/k3s/ && \
cat > /etc/rancher/k3s/registries.yaml << EOF
mirrors:
quay.io:
endpoint:
- "https://<internal cr>/v2"
rewrite:
"(.*)": "quay-proxy/\$1"
ghcr.io:
endpoint:
- "https://<internal cr>/v2"
rewrite:
"(.*)": "ghcr/\$1"
docker.io:
endpoint:
- "https://<internal cr>/v2"
rewrite:
"(.*)": "dockerio/\$1"
EOF

The above configuration enables mirrors to be utilized when pulling from specific repositories. This setting is global, i.e. if we were to pull an image from ghcr.io, it will automatically use our internal CR, which in-turn proxies out — smart!

4. K3s Installation

After scanning through the docs, a few tweaks were needed to enable Cilium to work properly and also change the default registry to our own internal registry.

# Install K3s
echo "Installing K3s!"
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="\
--flannel-backend=none \
--disable=traefik \
--disable-network-policy \
--system-default-registry=<internal cr>/dockerio \
--kube-apiserver-arg=kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname \
" sh -

5. Cilium Installation

Cilium installation also follows the documentations reccomended approach, with a few extra tweaks.

hubble.relay.enabled and hubble.ui.enabled is not really needed, but nice to have. If we really need to cut down on our resources, these shouldn't be deployed.

ingressController.enabled and ingressController.loadbalancerMode gives us the ability to utilise the built in Envoy ingress controller - if we need that. Again, if we are looking at a slim install, we don't need this.

To enable Cilium to route traffic to different interfaces based on which pod the request originates from, we need to implement a CiliumEgressGatewayPolicy. For this to be done, we need to enable egressGateway.enabled , bpf.masquerade and disable the l7Proxy. Further information on this can be seen in their docs

The final cilium status command is useful to gate the installation script before we continue to install FluxCD. This basically pauses until the CNI is ready.

# Install Cilium
echo "Installing CNI Cilium..."
cilium install \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set ingressController.enabled=true \
--set ingressController.loadbalancerMode=shared \
--set egressGateway.enabled=true \
--set bpf.masquerade=true \
--set kubeProxyReplacement=strict \
--set l7Proxy=false \
--set egressGateway.installRoutes=true

echo "Checking CNI..."
cilium status \
--wait

6. FluxCD Installation

For the icing on the cake, we also install FluxCD, we have already created a bootstrap deployment ready for our needs so we just point to our repo and off we go.

echo "Installing FluxCD.."
flux bootstrap github \
--owner=jysk-ops \
--repository=store-k3s \
--token-auth

Once our installation is complete we should have a fully functional cluster. Our Flux setup installs extra services such as ExternalSecrets along with a ClusterSecretStore and we also configure a CiliumEgressGatewayPolicy.

See the related blog post on this one!

Putting it all together

For user convenience, we can utilize tools like CloudInit, Ansible, and others to set up our cluster. Stay tuned for more details in our upcoming post.

Basic Installation
Check Cluster is Running

Resource usage after the install is pretty decent. It uses a little more RAM than expected, but we have also added externalsecrets, netshoot and some other bits.

Uninstalling our environment

To uninstall our cluster, we can simply run the standard k3s uninstall script — this will of course remove everything we have just setup.

/usr/local/bin/k3s-uninstall.sh

What’s next?

So there you have it, a quick and easy way to install a simple K3s setup with Cilium and FluxCD that adheres to our corporate proxies and internal network structures to get the best of both worlds.

It’s important to highlight that the aforementioned is merely a prototype. In the subsequent two articles, we’ll refine the setup for a production environment, delve into large-scale deployment, and examine the FluxCD bootstrap installation. Furthermore, we’ll investigate the resources integrated into our cluster and Cilium’s CiliumEgressGatewayPolicy, facilitating the routing of traffic from pods to a designated interface.

Stay Tuned!

--

--

DevOps / Infrastructure 🚀 Tech evangelist🤘🏻Linux & K8s fan | KCD & Cloud Native Aarhus Organizer