From 0 to 10’000 Jenkins builds a week — part 1: a setup everyone loves

Published in

Swissquote Tech Blog

9 min readMay 31, 2023

Logos of Jenkins, Kubernetes, Docker and SonarQube

In 2023, Swissquote runs 50 fully automated instances of Jenkins in Kubernetes, one per team. Each code push from a developer results in a build in the team’s instance. If the project doesn’t exist yet, it gets created automatically. The cluster performs approximately 10,000 builds per week.

When I joined Swissquote as a junior frontend developer, I saw that many teams had more machines than team members, and it made me wonder, what are these machines for?

It turns out that they scavenged old machines and installed Jenkins (a Continuous Integration software), and configured some of their projects to build automatically and regularly. Curious and naive, I asked many teams how it was going for them, and more often than not, the explanation finished with “but it’s broken at the moment,” or “we should update it,” or “but the build has been red for a week, and nobody fixed it.”

This three-part article will walk you through our journey from 0 to 50 automated Jenkins instances, the brilliant plans that only sometimes worked as intended but eventually brought us to a robust and stable infrastructure.

Hi, I’m Stéphane Goetz, Principal Software Engineer at Swissquote. I have been here for 11 years and am member of the Design and Technology Group. We are a team of software engineers and architects. As you will discover in this article, we are not sysadmins.

2014: Let’s get this party started

In 2014, Continuous Integration (CI) was not automated at all at Swissquote. We used Mercurial to store our code, and continuous Integration, if done at all, meant repurposing an old developer PC with Jenkins and manually configuring jobs to build a repository. Teams understood that CI could benefit them but wanted to avoid investing that time to experiment and set it up.

Death by a Thousand cuts

You’ve probably heard a similar story already, but this is how unsustainable it was for us:

Once the machine was installed, it was very rarely updated, if at all
All projects were running on the machine, and this was usually translated by project-specific configuration files scattered everywhere on the filesystem; a missing or outdated configuration file often broke builds.
Job configuration was manual, each Job was configured by hand, and improvements on one Job were not applied automatically to all other jobs.
Since everything was done manually, it was set up once and not touched again. Machines never got upgraded, leading to slow builds because of old hardware.
Releases were done on developer machines or… our in-house attempt at a CI server.

In 2014, our team, Design and Technology Group (DTG), was just created, and our goal … well, that’s a story for another day. In short, it was “anything that isn’t the goal of other teams.” So we set out to improve other engineers’ lives and decided that our first big project would be to provide a fully automated CI environment.

Why roll our own CI environment when we could get one relatively inexpensively in the cloud? First, in 2014 there weren’t as many cloud CI environments as today, and the choice was more limited than it is today. Second, as a bank, we must follow regulations, which we understand as “keeping all our source code and artifacts on premises.” Moving those to the cloud would have been a significant undertaking.

First attempt with one big machine

Our first planned stop was to centralize all jobs on a single machine to make it easier to manage them. We requested a powerful machine from our infrastructure department, and they found one for us. We installed Jenkins on the machine and added some projects to this server.

Jenkins is an open-source automation server. It helps automate the parts of software development related to building, testing, and deploying, facilitating continuous Integration and continuous delivery.

This did not help at all … the only difference with the previous approach was that we handled the machine instead of the team using it. Otherwise, it had the same downsides as the “Jenkins-under-the-desk” approach.

We went back to the drawing board.

2015: The year of containers and Kubernetes

Docker is a tool for running containers — a standard unit of software that packages up code and its dependencies. This allows the code to run reliably and in a reproducible way in any computing environment.
Kubernetes is an open-source container orchestration system that automates software deployment, scaling, and management. Originally designed by Google, the project is now maintained by the Cloud Native Computing Foundation.

We were already adopting Docker for our development environment when Kubernetes was announced, and we felt it was the right fit for our build cluster. The negotiations took time, but we were able to order four machines for our cluster to be able to install it.

We made our initial sizing; while I don’t remember the exact specs, I remember that we requested 4 × 16 cores, and we got 4 × 4 cores … If the title or the article is of any indication, it didn’t take long until we ordered more machines 🙂

Initially, we planned that the infrastructure department would provide us with a Kubernetes cluster. Surprise — bootstrapping and operating such a cluster without prior knowledge was not in their high-priority plans back then. So they gave us root access to the machine, and we suddenly became sysadmins. Yay us.

Preparing an MVP

Now that we had machines, we needed to get a Kubernetes cluster up and running. While we did this, we also finalized our strategy for a robust build farm:

Each team should have a separate Jenkins instance, as it would clarify ownership: Your instance; your responsibility to keep it green.
Each build has to run in an isolated build environment (using Docker) that starts from a clean environment, and that’s deleted after use.
On code push, jobs should be automatically created in Jenkins, making adoption a no-brainer.
Jobs configuration should be centrally managed AND configurable at the same time (yes, this seems counter-intuitive; bear with us to see how we achieved that)

How does it work?

A diagram to illustrate the interactions from an engineer’s code push to triggering a build on Jenkins

The MVP consisted of a script, a service, Docker, and Jenkins

On each push to Mercurial (our Code Repository at the time), a script sent an HTTP request to a service (sq-ci, short for Swissquote Continuous Integration; our marketing department is proud of our product naming skill)
This service would examine a configuration file to determine which team the code belongs to. Then it will either use the team’s Jenkins instance or create one by copying a folder of default configurations and pointing a Docker image to it.
Once the instance is created and ready, it ensures the Job is created and triggers a new build.
The only Job type we supported was Maven jobs, and the only prerequisite to creating a job was the existence of a pom.xml at the root of the repository.
Each branch of a repository was created as a separate job on Jenkins. Closing a branch also deleted the Job on Jenkins
We were using job inheritance on Jenkins. We made a __template_global job and a __template__team job.
– __template_global Contains all the configurations necessary to get a job. This would be overridden when we’d need to apply a change to all instances.
– __template__team Is freely customizable and inherits the global Job. Each Job would then inherit from the team job. This is how we were able to enforce global configuration, allow customization for all jobs and give freedom to teams to adapt the tool to their needs.

What did we learn from this MVP?

This MVP was a resounding success. Most teams adopted the build farm without even thinking about it, and in no time, many under-the-desk Jenkins instances were shut down. Still, not everything was perfect; let’s look at the outcomes.

Build configurations became more robust because they now have to fully describe the build environment for the build to run, which is then run within a Docker image. As a result, builds were also repeatable across developer machines.
We quickly grew to 20 Jenkins instances. At this stage, this wasn’t an MVP anymore, and it became time to add more build power to the cluster; We ordered more machines.

Challenges

Having the entire environment within Docker means all assets (Maven, NPM …) must be downloaded from the central server on each build. This prompted us to mount a shared directory for all builds on a single machine to share a single build cache.
Using a global cache, disk space gets used fast; think about Maven artifacts, Docker images, NPM artifacts… Build servers had 700GB NVMe disks for builds, but with the growing amount of builds, they started to get filled up quickly; we’ve had to create scripts to automatically and aggressively clean up assets that weren’t used recently. Luckily, these scripts work so well that we didn’t need to touch them for more than three years, and the build partitions never reached 100% usage again.
Jenkins jobs are tough to customize and keep up-to-date across many projects.

2016: Improvements all around

Jenkins 2.0 with Code Pipelines

In April 2016, Jenkins 2.0 came out with one massive new feature: Scripted pipelines!

Instead of clicking through a UI, the build steps are described in aJenkinsfile : a Groovy script that allows developers to script the steps required to run their build. The biggest win here is that this file is stored in the repository to stay up-to-date over time.

This feature also came with shared pipeline libraries, a way to create functions that can be shared among builds. In no time, we had some base functions that everybody could use, like: configuredNode()that starts a Jenkins Agent (build environment) inside Kubernetes with all our environment-specific configurations, mvn()to Compile the code and run tests with Maven

These functions helped keep a low number of different pipelines. While in the beginning, each team experimented on their side, they eventually converged to a single default pipeline.

Today, most project Jenkinsfile now contain a single function call to trigger the default pipeline, optionally with some configuration options.

While the “old” project mode was still supported, we strongly encouraged people to start using this new way of doing as it would help them become more autonomous in their process.

Out-of-the-box SonarQube support

At this stage, Jenkins was adopted by most teams, but some teams did not want to migrate as they had a critical feature we didn’t support: SonarQube.

SonarQube is a tool that gathers metrics about code quality, code coverage, and other code-related issues. It delivers a score and a quality gate. Your build will fail if the code doesn’t meet the quality gate standards.

We used the same principle for SonarQube as we did for Jenkins; each team would get its instance and the freedom to configure it however they want. That’s around when we discovered Helm Charts, so we used helm charts to provision one PostgreSQL database and one instance of SonarQube per team.

Helm helps you manage Kubernetes applications — Helm charts help you define, install, and upgrade even the most complex Kubernetes application. As Kubernetes objects (pods, deployments …) can be configured by YAML files, keeping a fleet of applications up-to-date can become complicated if they have some configuration variants. Helm charts are — in very short — a set of YAML files with variable placeholders. Variable file + Helm Chart = application running in a cluster. This makes managing a fleet of similar applications a breeze.

Luckily, most teams were using our shared mvn() pipeline function. This meant we could add SonarQube easily and without changing the code repositories. It also encouraged people to migrate from the “project” style configuration to the “pipeline” style.

At this stage, no “under-the-desk” Jenkins was left!

Takeaways

Jenkins, particularly Jenkins Pipelines, can scale to thousands of repositories and their specifics while providing sane and safe defaults to our engineers.
SonarQube and Jenkins pairing is a must-have. Code quality and CI usually play well together.
Making onboarding easy is critical to a high adoption rate; Teams had no specific action to take to get their code into Continuous Integration; all was done for them, and they could benefit from the features for free.
Leave Kubernetes administration to professionals, don’t build your cluster if you don’t have the resources to keep it up-to-date and know how to operate a cluster of machines. Kubernetes doesn’t replace sysadmins; if anything, you would need more sysadmins. We now know way more than we would like about Kubernetes’ internals.
Kubernetes is an incredible platform to deploy applications on; we could have used a more straightforward approach like Docker Compose + Pre-provisioned agents. But in the end, we’re happy to have learned how to deploy applications with Kubernetes.

Stay tuned for Part 2

I will dive into the intricacies of maintaining a fleet of instances, automating their creation and upgrade.

Update 26/06: Read Part 2 here:

From 0 to 10'000 Jenkins builds a week — part 2: Automating a fleet of applications

This second part will walk you through how we automated Jenkins’ configuration, update and day to day operations

medium.com