Streamline Kubernetes Management: GitOps with Flux and Kustomize

Darío Blanco Iturriaga
10 min readAug 12, 2023

In this blog post, we’ll explore the powerful combination of GitOps, Flux, and Kustomize, which together form a winning trio for Kubernetes management. Flux, a leading GitOps tool, automates deployments and synchronizes your cluster state with the Git repository. Meanwhile, Kustomize enhances the configurability of Kubernetes manifests, allowing you to customize resources based on different environments without the need for duplicative YAML files.

To bring these concepts to life, we’ll be diving into an example repository that showcases the practical implementation of GitOps using Flux and Kustomize.

Understanding GitOps

GitOps is a paradigm shift in the way in which infrastructure is managed. Traditionally (and as a best practice), Infrastructure as Code is done by applying the definitions stored in a file, either declaratively (e.g. Kubernetes YAML manifests or Terraform HCL files) or imperatively with Cloud Development Kits (CDKs) that allow you to use your favorite programming language like Python, TypeScript, Go… However, the “apply” process is done either manually or (ideally) by a CI/CD tool when there is a new change, only ensuring that the state of the Git repository will be reflected in the infrastructure when that execution process is applied successfully.

Ways to apply Infrastructure as Code (IaC)

In a GitOps workflow, like in the traditional best practice, the desired state of your infrastructure is represented by the configurations stored in a Git repository. This version-controlled repository serves as the single source of truth, and GitOps enforces the state that the cluster should be in. Any changes to the cluster are made by updating the configuration files in the Git repository, rather than directly interacting with the cluster.

This approach has several benefits:

  • Introduces an auditable and reproducible process for deployments. Rollbacks become as simple as reverting to a known working state in the Git history. The changes in the infrastructure can be seen easily in the git log.
  • Allows for automatic and continuous synchronization between the desired state in the Git repository and the actual state in the cluster, ensuring consistency across environments, simplifying the traditionally complex CI/CD pipelines, and reducing human error.
  • Promotes collaboration and transparency among development and operations teams. Changes to the cluster are transparently tracked in Git, making it easy for team members to understand what’s happening and when.
  • Provides self-heal capabilities. If somebody changes the state of the infrastructure manually, the state will automatically change to reflect what is defined in the Git repository. This brings resilience and a cluster that will be easier to debug if there is a problem.

Introducing Flux

Flux is the GitOps tool that continuously monitors the designated Git repository, seeking any changes in the desired state of the cluster. It is the part that allows us to not have to set up a CI/CD pipeline that will apply the changes from the repository into the cluster.

Flux internals are composed of different controllers, with the main ones defined in this picture. The Kustomize and Helm controllers define the Kubernetes resources using these two wide-spread technologies. The source controller takes the task to read IaC from different git providers. Source: The Flux authors

In GitOps, the traditional CI/CD pipeline is “reversed”. This is what we call “pull-based deployments,” where the target cluster constantly pulls its desired state from the Git repository, rather than relying on a “push” from an external entity (e.g. a CI/CD tool like Jenkins or Github Actions). This “pull” approach provides a self-healing mechanism, allowing the cluster to autonomously converge to the desired state even if it drifts due to manual changes or unforeseen events.

Leveraging Kustomize for Customization

As your applications and infrastructure grow in complexity, managing multiple sets of YAML files for different environments can become cumbersome and error-prone.

Kustomize comes to the rescue with its declarative approach to configuration management. Instead of maintaining separate copies of YAML files for each environment, Kustomize allows you to define overlays and patches within your base manifests. These overlays can be applied dynamically, modifying specific parts of the configuration without duplication.

Source: The Kubernetes authors

In this example, we tailor the resources to different environments (dev, staging and prod), and keep the common parts in a base folder. Kustomize will apply the overlays and patches defined in each environment folder based on the common definitions from base.

Setting up the example repository

These are the steps to set up your Kubernetes cluster on your local computer.

  1. Fork the example gitops repository.
  2. Run cp .envrc.example .envrc and edit the .envrc file with your own secrets. Make sure that GITHUB_TOKEN has admin permissions in your fork, so it can create the deploy keys.
  3. Ensure that all the prerequisites defined in the init target from the Makefile are defined. Not all prerequisites are essential so you can adapt them to your needs. Run make init to ensure that all are properly installed. See Prerequisites for more info.
  4. Run make bootstrap to create a local Kubernetes cluster. The bootstrap process creates a cluster using kind (create-cluster.sh) and provisions that cluster with the Flux bootstrap command (provision-cluster.sh). The provision step will configure the deploy keys so make sure that the provided GITHUB_TOKEN has the required permissions!

After a few minutes, there will be a working local cluster that can be accessed with kubectl.

Output after running the bootstrap target

Multi-Environment support

The example repository defines three environments for a cluster that intends to hold all the applications and infrastructure components, as it is very likely that most use cases can be satisfied with a single Kubernetes cluster. Having pre-production environments is key to testing out new component versions or adaptations to new approaches to gain trust before applying them to production.

  • A development environment (dev) with the intention to test out changes in a local cluster. In the example repository the make e2e command validates the Kubernetes manifests, creates akind cluster and destroys it once the components are healthy.
  • A staging environment (staging) that should be in the cloud provider or datacenter of your choice, and be as similar to production as possible. The recommendation is that staging is identical to production, with the only difference being the traffic the cluster receives.
  • A production environment (prod) that serves customers. Changes to the infrastructure of this cluster should be as tested as possible in the previous clusters. Monitoring is key, because no matter how much a cluster is tested automatically and manually, a defect can always happen. Nonetheless, with GitOps rollbacks are very easy so having this pipeline brings a lot of confidence to rolling out new changes.

The manifests for each environment can be found in the **/dev/**, **/staging/**, and **/prod/** file patterns respectively.

Helm Charts Integration

Helm releases are managed with Flux’s helm controller. The example repository defines most infrastructure components (the controllers) with the HelmRelease and HelmRepository CRDs. Flux makes sure that the version specified in HelmRelease is always satisfied.

---
# An example of a cert-manager release that will always apply the latest
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: cert-manager
namespace: cert-manager
spec:
interval: 30m
chart:
spec:
chart: cert-manager
version: "*"
sourceRef:
kind: HelmRepository
name: cert-manager
interval: 12h
values:
installCRDs: true
---
# An example of a cert-manager repo that will be used to grab the release from
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: cert-manager
namespace: cert-manager
spec:
interval: 24h
url: https://charts.jetstack.io

Repo per app

The example repository is structured using a repo per app strategy. The idea is that the example repository is considered the “cluster config repo” and owned by a platform team, while the application deployment manifests are stored in their respective repos and owned by the development teams.

Therefore, the apps are structured in a way in which their manifests are not stored in the repository: they point to different repos. And there are two examples: one as a HelmRelease (podinfo) and another one as a Kustomization (fastapi-example).

The HelmRelease strategy follows the approach mentioned previously, though the Kustomization approach is worth mentioning: it is a Kustomization CRD whose path will be overwritten by the specific overlay.

# apps/base/fastapi-example/controller.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: fastapi-example
namespace: flux-system
spec:
interval: 1m
prune: true
sourceRef:
kind: GitRepository
name: fastapi-example
decryption:
provider: sops
secretRef:
name: sops-age
# apps/base/fastapi-example/controller.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: fastapi-example
namespace: flux-system
spec:
path: ./deploy/dev

The example manifest for fastapi-example can be found here.

Secrets Management

There are some files with the *.enc.yaml extension that hold encrypted secrets. I decided to use Mozila SOPS and age to simplify secret management as much as possible (which is always tricky by default).

# infrastructure/controllers/dev/weave-gitops/values-secret.enc.yaml
adminUser:
passwordHash: ENC[AES256_GCM,data:9perX/E92La7I+CO9jirh0n9aB0kl5I7H9Wmtk6aGrWq9TQ4Lya6rJ5FtVfEWZWyna0ch8tv0/hSTW6t,iv:gC4h33cOrcmA3O3rDLySWXBkucqxivH5thwPuk77S6s=,tag:Smop2c2EYf8rIYSJ7MgmWw==,type:str]
sops:
kms: []
gcp_kms: []
azure_kv: []
hc_vault: []
age:
- recipient: age1qvesyd4zyqs5p40n8gr2ngjvsg6surf9e37h3xv7rm7m5lsgz5jsetg3ql
enc: |
-----BEGIN AGE ENCRYPTED FILE-----
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBlR0RjdHJYMkhVbnpnZ0Fv
Mms2dGVoVnpmaU5Rd000MjFtOWttODRjY1FvCnVicXQ3cnRGSVhtbjZEOUtRemRm
Qm4wRHNCZjc2MFppK1VZbi9iWTk2MkkKLS0tIDM2QnY0cmhOSjJNV3ZFQlNGaXdt
MmxyRlh6V1RnaU1sUkxqYktxQWh3QUUK3xOFl5ZI2xOJJJpgxEhKl3SpNaqtLT3H
9GqOAy4CErF6HkY5LrzetEPAeYykro6XejTYqlQxw88XslGXd4HpDg==
-----END AGE ENCRYPTED FILE-----
lastmodified: "2023-07-18T16:56:11Z"
mac: ENC[AES256_GCM,data:QEVeCGPpEq1pA41wxVh9uKT88DH7Pqt0Y/bLM5wr0eEcNF/tCin5v+HMOdBmvnBChXczTEnohFXV2Kl4SOAcvTvAqRulWqHtYE680TFOqsdsKw66cfwnEyb2U7wPvD50vNL2sB+OTBJZvoyFQS3RBELCmSMbMiJ9iWwGa9w6oSs=,iv:TnjkiOtJNOPTm892zX9zr5zrr7viBRqIAOwqZF8ZlRM=,tag:rmjsjSUK8rnP2ZFGLdxuZg==,type:str]
pgp: []
unencrypted_suffix: _unencrypted
version: 3.7.3

The process works as follows:

  • Generate a private key per cluster with age-keygen. The private key should be stored in ./clusters/{environment}/sops.agekey. This key is already git ignored because it must never be pushed to the repository. For instance:
$ age-keygen >> ./clusters/dev/sops.agekey
$ cat ./clusters/dev/sops.agekey
# created: 2023-07-17T14:07:50+02:00
# public key: age1v6q8sylunaq9m08rwxq702enmmh9lama7sp47vkcw3z8wm74z39q846s3y
AGE-SECRET-KEY-THIS_IS_A_SECRET_THAT_SHOULD_NEVER_BE_PUSHED
  • Store the public key in ./.sops.yaml. Mozilla SOPS uses this file to know which key to use when decrypting and encrypting. The encrypted_regex the parameter is especially useful because it will only encrypt parts of the file and not the file entirely, which helps with readability.
# ./.sops.yaml
creation_rules:
# Dev secrets
- path_regex: .*dev/.*values-secret.yaml
age: age1qvesyd4zyqs5p40n8gr2ngjvsg6surf9e37h3xv7rm7m5lsgz5jsetg3ql
- path_regex: .*dev/.*.yaml
encrypted_regex: ^(data|stringData)$
age: age1qvesyd4zyqs5p40n8gr2ngjvsg6surf9e37h3xv7rm7m5lsgz5jsetg3ql
# Staging secrets
- path_regex: .*staging/.*values-secret.yaml
age: age1fapwknfa6lm0rmpxe4dkuyjpcz9wwju73ghw53f74yqjhrevtyxs43h2yg
- path_regex: .*staging/.*.yaml
encrypted_regex: ^(data|stringData)$
age: age1fapwknfa6lm0rmpxe4dkuyjpcz9wwju73ghw53f74yqjhrevtyxs43h2yg
# Production secrets
- path_regex: .*prod/.*values-secret.yaml
age: age1us3r24et6a5kn8e4plqtvghchpuyj56n7errv72stqhgxq2l8dhsrem3u7
- path_regex: .*prod/.*.yaml
encrypted_regex: ^(data|stringData)$
age: age1us3r24et6a5kn8e4plqtvghchpuyj56n7errv72stqhgxq2l8dhsrem3u7
  • Now it is possible to encrypt and decrypt files with the sops command. I have created an encrypt.sh and decrypt.sh scripts to help with this process.
$ ./scripts/encrypt.sh secret.yaml
secret.enc.yaml 20ms
✅ Encrypted file saved to secret.enc.yaml
$ ./scripts/decrypt.sh secret.enc.yaml
✅ Decrypted file saved to secret.yaml

Automate image updates

One of the coolest features of Flux is their automated image updates to Git support. This basically empowers CI/CD workflows so Flux will commit into the Git repository with the image versions that should be released into a target environment.

For instance, imagine that I have an application like fastapi-example whose deployment is basically a container image. It would be great if the application would automagically deploy to my different environments depending on the version the image has. For instance, if I use semantic versioning I want it to be deployed to production (except release candidates), while I want the commit hash based versions deployed to staging, and release candidates to an environment that spins only for that release candidate alone. The possibilities are endless, and I do not need to tie it to a git branch because the container image is completely independent from the branch workflow.

With Flux, it is possible to have that “magic” without putting complexity in your CI/CD pipeline. Your pipeline should just worry that the image is pushed “somewhere”, and then Flux will listen with an ImageRepositoryand ImagePolicyfor updates in the container registry where you put such image. If there is an update that matches a desired rule.

---
# Connects to a Github Docker repository using the secret stored in github-docker
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageRepository
metadata:
name: fastapi-example
namespace: flux-system
spec:
image: ghcr.io/darioblanco/fastapi-example
interval: 1h
secretRef:
name: github-docker
---
# Only notify if a new image version matches the defined policy
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy
metadata:
name: fastapi-example
namespace: flux-system
spec:
imageRepositoryRef:
name: fastapi-example
filterTags:
pattern: "^main-[a-fA-F0-9]+-(?P<ts>.*)"
extract: "$ts"
policy:
numerical:
order: asc

You can configure an ImageUpdateAutomation resource to push a commit to the target repository with the version change, which will trigger a reconciliation (and the actual deployment). In this case, once the related policy tells us that a new image with the expected pattern is pushed, the Image Automation Controller will push a commit to the specific repo and the file defined in the update setting. For the fastapi-example, such file is the kustomization by itself, so the update.path setting can be overriden by an overlay.

apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageUpdateAutomation
metadata:
name: fastapi-example
namespace: flux-system
spec:
interval: 30m
sourceRef:
kind: GitRepository
name: fastapi-example
git:
checkout:
ref:
branch: main
commit:
author:
email: fluxcdbot@users.noreply.github.com
name: fluxcdbot
messageTemplate: |
chore: automated image update

Automation name: {{ .AutomationObject }}

Files:
{{ range $filename, $_ := .Updated.Files -}}
- {{ $filename }}
{{ end -}}

Objects:
{{ range $resource, $_ := .Updated.Objects -}}
- {{ $resource.Kind }} {{ $resource.Name }}
{{ end -}}

Images:
{{ range .Updated.Images -}}
- {{.}}
{{ end -}}
push:
branch: main
update:
path: ./deploy/base
strategy: Setters

You can browse the fastapi-example definition in the configuration repo for more details.

Receive notifications

We talked about “pull-based” deployments because GitOps are doing polling by default. Therefore, intervals are often configured but putting a short one won’t scale very well if you have a lot of applications in your ecosystem. In addition, your development teams might complain that a deployment is slow, because after the image is built, and everything looks green, the reconciliation will happen in the worst case after the interval time expires.

In order to speed up deployments and eliminate the need to reduce interval times, Flux offers the webhook receivers functionality. You can see it in the example as an infrastructure config in flux-receivers.

The following setting will schedule a reconciliation if Flux receives a webhook from the flux-system or fastapi-example repositories.

# See https://fluxcd.io/flux/components/notification/receiver/#example
apiVersion: notification.toolkit.fluxcd.io/v1
kind: Receiver
metadata:
name: flux-system-receiver
namespace: flux-system
spec:
type: github
events:
- "ping"
- "push"
secretRef:
name: receiver-token
resources:
- apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
name: flux-system
- apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
name: fastapi-example

Conclusion

The world of managing infrastructure is ever-evolving, and I invite you to continue exploring, experimenting, and learning. GitOps is a philosophy that has come to stay. By adopting GitOps principles and its related tools, you’ll become a champion in infrastructure management, making sure that it does not become a hassle for your teams, thus developers can focus more in application development and less in creating/troubleshooting complex CI/CD pipelines: the maintenance hurdle of the GitOps tool can go totally to a platform team.

--

--