Scaling Microservices with Message Queues, Spring Boot and Kubernetes

November 2019

Scaling SpringBoot with Message Queues and Kubernetes

When you design and build applications at scale, you deal with two significant challenges: scalability and robustness.

You should design your service so that even if it is subject to intermittent heavy loads, it continues to operate reliably.

Take the Apple Store as an example.

Every year millions of Apple customers preregister to buy a new iPhone.

That's millions of people all buying an item at the same time.

If you were to picture the Apple store's traffic as requests per second over time, this is what the graph could look like:

Now imagine you're tasked with the challenge of building such application.

You're building a store where users can buy their favourite items.

You build a microservice to render the web pages and serving the static assets.

You also build a backend REST API to process the incoming requests.

You want the two components to be separated because with the same REST API you could serve the website and mobile apps.

Today turned out to be the big day, and your store goes live.

You decide to scale the application to four instances for the front-end and four instances for the backend because you predict the website to be busier than usual.

You start receiving more and more traffic.

The front-end services are handling the traffic fine.

You notice that the backend that is connected to the database is struggling to keep up with the number of transactions.

No worries, you can scale the number of replicas to 8 for the backend.

You're receiving even more traffic, and the backend can't cope with it.

Some of the services start dropping connections.

Angry customers get in touch with your customer service.

And now you're drowning in traffic.

Your backend can't cope with it, and it drops plenty of connections.

You just lost a ton of money, and your customers are unhappy.

Your application is not designed to be robust and highly available:

the front-end and the backend are tightly coupled — in fact it can't process applications without the backend
the front-end and backend have to scale in concert — if there aren't enough backends, you could drown in traffic
if the backend is unavailable, you can't process incoming transactions.

And lost transactions are lost revenues.

You could redesign your architecture to decouple the front-end and the backend with a queue.

The front-end posts messages to the queue, while the backend processes the pending messages one at a time.

The new architecture has some obvious benefits:

if the backend is unavailable, the queue acts as a buffer
if the front-end is producing more messages than what the backend can handle, those messages are buffered in the queue
you can scale the backend independently of the front-end — i.e. you could have hundreds of front-end services and a single instance of the backend

Great, but how do you build such application?

How do you design a service that can handle hundreds of thousands of requests?

And how do you deploy an application that scales dynamically?

Before diving into the details of deployment and scaling, let's focus on the application.

Coding a Spring application

The service has three components: the front-end, the backend and a message broker.

The front-end is a simple Spring Boot web app with the Thymeleaf templating engine.

The backend is a worker consuming messages from a queue.

And since Spring Boot has excellent integration with JMS, you could use that to send and receive asynchronous messages.

You can find a sample project with a front-end and backend application connected to JMS at learnk8s/spring-boot-k8s-hpa.

Please note that the application is written in Java 10 to leverage the improved Docker container integration.

There's a single code base, and you can configure the project to run either as the front-end or backend.

You should know that the app has:

a homepage where you can buy items
an admin panel where you can inspect the number of messages in the queue
a /health endpoint to signal when the application is ready to receive traffic
a /submit endpoint that receives submissions from the form and creates messages in the queue
a /metrics endpoint to expose the number of pending messages in the queue (more on this later)

The application can function in two modes:

As frontend, the application renders the web page where people can buy items.

As a worker, the application waits for messages in the queue and processes them.

Please note that in the sample project the processing is simulated by waiting for five seconds with a Thread.sleep(5000).

You can configure the application in either mode, by changing the values in your application.yaml.

Dry-run the application

By default, the application starts as a frontend and worker.

You can run the application and, as long as you have an ActiveMQ instance running locally, you should be able to buy items and having those processed by the system.

If you inspect the logs, you should see the worker processing items.

It worked!

Writing Spring Boot applications is easy.

A more interesting subject is learning how to connect Spring Boot to a message broker.

Sending and receiving messages with JMS

Spring JMS (Java Message Service) is a powerful mechanism to send and receive messages using standard protocols.

If you used the JDBC API in the past, you should find the JMS API familiar since it works similarly.

The most popular message broker that you can consume with JMS is ActiveMQ — an open source messaging server.

With those two components, you can publish messages to a queue (ActiveMQ) using a familiar interface (JMS) and use the same interface to receive messages.

And even better, Spring Boot has excellent integration with JMS so you can get up to speed in no time.

In fact, the following short class encapsulate the logic used to interact with the queue:

QueueService.java

@Component
public class QueueService implements MessageListener {
  private static final Logger LOGGER = LoggerFactory.getLogger(QueueService.class);

  @Autowired
  private JmsTemplate jmsTemplate;

  public void send(String destination, String message) {
    LOGGER.info("sending message='{}' to destination='{}'", message, destination);
    jmsTemplate.convertAndSend(destination, message);
  }

  @Override
  public void onMessage(Message message) {
    if (message instanceof ActiveMQTextMessage) {
      ActiveMQTextMessage textMessage = (ActiveMQTextMessage) message;
      try {
        LOGGER.info("Processing task " + textMessage.getText());
        Thread.sleep(5000);
        LOGGER.info("Completed task " + textMessage.getText());
      } catch (InterruptedException e) {
        e.printStackTrace();
      } catch (JMSException e) {
        e.printStackTrace();
      }
    } else {
      LOGGER.error("Message is not a text message " + message.toString());
    }
  }
}

You can use the send method to publish messages to a named queue.

Also, Spring Boot will execute the onMessage method for every incoming message.

The last piece of the puzzle is instructing Spring Boot to use the class.

You can process messages in the background by registering the listener in the Spring Boot application like so:

SpringBootApplication.java

@SpringBootApplication
@EnableJms
public class SpringBootApplication implements JmsListenerConfigurer {
  @Autowired
  private QueueService queueService;

  public static void main(String[] args) {
    SpringApplication.run(SpringBootApplication.class, args);
  }

  @Override
  public void configureJmsListeners(JmsListenerEndpointRegistrar registrar) {
    SimpleJmsListenerEndpoint endpoint = new SimpleJmsListenerEndpoint();
    endpoint.setId("myId");
    endpoint.setDestination("queueName");
    endpoint.setMessageListener(queueService);
    registrar.registerEndpoint(endpoint);
  }
}

Where the id is a unique identifier for the consumer and destination is the name of the queue.

You can read the source code in full for the Spring queue service from the project on Github.

Notice how you were able to code a reliable queue in less than 40 lines of code.

You got to love Spring Boot.

All the time you save in deploying you can focus on coding

You verified the application works, and it's finally time to deploy it.

At this point, you could start your VPS, install Tomcat, spend some time crafting custom scripts to test, build, package and deploy the application.

Or you could write a description of what you wish to have: one message broker and two application deployed with a load balancer.

Orchestrators such as Kubernetes can read your wishlist and provision the right infrastructure.

Since less time spent in the infrastructure means more time coding, you'll deploy the application to Kubernetes this time.

But before you start, you need a Kubernetes cluster.

You could signup for a Google Cloud Platform or Azure and use the cloud provider Kubernetes offering.

Or you could try Kubernetes locally before you move your application to the cloud.

minikube is a local Kubernetes cluster packaged as a virtual machine.

It's great if you're on Windows, Linux and Mac as it takes five minutes to create a cluster.

You should also install kubectl, the client to connect to your cluster.

You can find the instructions on how to install minikube and kubectl from the official documentation.

If you're running on Windows, you should check out our detailed guide on how to install Kubernetes and Docker.

You should start a cluster with 8GB of RAM and some extra configuration:

bash

minikube start \
  --memory 8096 \
  --extra-config=controller-manager.horizontal-pod-autoscaler-upscale-delay=1m \
  --extra-config=controller-manager.horizontal-pod-autoscaler-downscale-delay=2m \
  --extra-config=controller-manager.horizontal-pod-autoscaler-sync-period=10s

Please note that if you're using a pre-existing minikube instance, you can resize the VM by destroying it an recreating it. Just adding the --memory 8096 won't have any effect.

You should verify that the installation was successful with:

bash

kubectl get all

You should see a few resources listed as a table.

The cluster is ready, perhaps you should start deploying now?

Not yet.

You have to pack your stuff first.

What's better than an uber-jar? Containers

Applications deployed to Kubernetes have to be packaged as containers.

After all, Kubernetes is a container orchestrator, so it isn't capable of running your jar natively.

Containers are similar to fat jars: they contain all the dependencies necessary to run your application.

Even the JVM is part of the container.

So they're technically an even fatter fat-jar.

A popular technology to package applications as containers is Docker.

While being the most popular, Docker is not the only technology capable of running containers. Other popular options include rkt and lxd.

If you don't have Docker installed, you can follow the instructions on the official Docker website.

Usually, you build your containers and push them to a registry.

It's similar to publishing jars to Artifactory or Nexus.

But in this particular case, you will work locally and skip the registry part.

In fact, you will create the container image directly in minikube.

First, connect your Docker client to minikube by following the instruction printed by this command:

bash

minikube docker-env

Please note that if you switch terminal, you need to reconnect to the Docker daemon inside minikube. You should follow the same instructions every time you use a different terminal.

and from the root of the project build the container image with:

bash

docker build -t spring-k8s-hpa .

You can verify that the image was built and is ready to run with:

bash

docker images | grep spring

Great!

The cluster is ready, you packaged your application, perhaps you're ready to deploy now?

Yes, you can finally ask Kubernetes to deploy the applications.

Deploying your application to Kubernetes

Your application has three components:

the Spring Boot application that renders the frontend
ActiveMQ as a message broker
the Spring Boot backend that processes transactions

You should deploy the three component separately.

For each of them you should create:

A Deployment object that describes what container is deployed and its configuration
A Service object that acts as a load balancer for all the instances of the application created by the Deployment

Each instance of your application in a deployment is called a Pod.

Deploy ActiveMQ

Let's start with ActiveMQ.

You should create a activemq-deployment.yaml file with the following content:

activemq-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: queue
spec:
  replicas: 1
  selector:
    matchLabels:
      app: queue
  template:
    metadata:
      labels:
        app: queue
    spec:
      containers:
      - name: web
        image: webcenter/activemq:5.14.3
        imagePullPolicy: IfNotPresent
        ports:
          - containerPort: 61616
        resources:
          limits:
            memory: 512Mi

The template is verbose but straightforward to read:

you asked for an activemq container from the official registry named webcenter/activemq
the container exposes the message broker on port 61616
there're 512MB of memory allocated for the container
you asked for a single replica — a single instance of your application

Create a activemq-service.yaml file with the following content:

activemq-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: queue
spec:
  ports:
    - port: 61616
      targetPort: 61616
  selector:
    app: queue

Luckily this template is even shorter!

The yaml reads:

you created a load balancer that exposes port 61616
the incoming traffic is distributed to all Pods (see deployment above) that has a label of type app: queue
the targetPort is the port exposed by the Pods

You can create the resources with:

bash

kubectl create -f activemq-deployment.yaml
kubectl create -f activemq-service.yaml

You can verify that one instance of the database is running with:

bash

kubectl get pods -l=app=queue

Deploy the front-end

Create a fe-deployment.yaml file with the following content:

fe-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend
        image: spring-boot-hpa
        imagePullPolicy: IfNotPresent
        env:
        - name: ACTIVEMQ_BROKER_URL
          value: "tcp://queue:61616"
        - name: STORE_ENABLED
          value: "true"
        - name: WORKER_ENABLED
          value: "false"
        ports:
          - containerPort: 8080
        readinessProbe:
          initialDelaySeconds: 5
          periodSeconds: 5
          httpGet:
            path: /health
            port: 8080
        resources:
          limits:
            memory: 512Mi

The Deployment looks a lot like the previous one.

There're some new fields, though:

there's a section where you can inject environment variables
there's the liveness probe that tells you when the application is ready to accept traffic

Create a fe-service.yaml file with the following content:

fe-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: frontend
spec:
  ports:
    - nodePort: 32000
      port: 80
      targetPort: 8080
  selector:
    app: frontend
  type: NodePort

You can create the resources with:

bash

kubectl create -f fe-deployment.yaml
kubectl create -f fe-service.yaml

You can verify that one instance of the front-end application is running with:

bash

kubectl get pods -l=app=fe

Deploy the backend

Create a backend-deployment.yaml file with the following content:

backend-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
      annotations:
        prometheus.io/scrape: 'true'
    spec:
      containers:
      - name: backend
        image: spring-boot-hpa
        imagePullPolicy: IfNotPresent
        env:
        - name: ACTIVEMQ_BROKER_URL
          value: "tcp://queue:61616"
        - name: STORE_ENABLED
          value: "false"
        - name: WORKER_ENABLED
          value: "true"
        ports:
          - containerPort: 8080
        readinessProbe:
          initialDelaySeconds: 5
          periodSeconds: 5
          httpGet:
            path: /health
            port: 8080
        resources:
          limits:
            memory: 256Mi

Create a backend-service.yaml file with the following content:

backend-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: backend
spec:
  ports:
    - nodePort: 31000
      port: 80
      targetPort: 8080
  selector:
    app: backend
  type: NodePort

You can create the resources with:

bash

kubectl create -f backend-deployment.yaml
kubectl create -f backend-service.yaml

You can verify that one instance of the backend is running with:

bash

kubectl get pods -l=app=backend

Deployment completed.

Does it really work, though?

You can visit the application in your browser with the following command:

bash

minikube service backend

and

bash

minikube service frontend

If it works, you should try to buy some items!

Is the worker processing transactions?

Yes, given enough time, the worker will process all of the pending messages.

Congratulations!

You just deployed the application to Kubernetes!

Scaling manually to meet increasing demand

A single worker may not be able to handle a large number of messages.

In fact, it can only handle one message at the time.

If you decide to buy thousands of items, it will take hours before the queue is cleared.

At this point you have two options:

you can manually scale up and down
you can create autoscaling rules to scale up or down automatically

Let's start with the basics first.

You can scale the backend to three instances with:

bash

kubectl scale --replicas=5 deployment/backend

You can verify that Kubernetes created five more instances with:

bash

kubectl get pods

And the application can process five times more messages.

Once the workers drained the queue, you can scale down with:

bash

kubectl scale --replicas=1 deployment/backend

Manually scaling up and down is great — if you know when the most traffic hits your service.

If you don't, setting up an autoscaler allows the application to scale automatically without manual intervention.

You only need to define a few rules.

Exposing application metrics

How does Kubernetes know when to scale your application?

Simple, you have to tell it.

The autoscaler works by monitoring metrics.

Only then it can increase or decrease the instances of your application.

So you could expose the length of the queue as a metric and ask the autoscaler to watch that value.

With the autoscaler enabled, the more pending messages in the queue, the more instances of your application Kubernetes will create.

So how do you expose those metrics?

The application has a /metrics endpoint to expose the number of messages in the queue.

If you try to visit that page you'll notice the following content:

/metrics

# HELP messages Number of messages in the queue
# TYPE messages gauge
messages 0

The application doesn't expose the metrics as a JSON format.

The format is plain text and is the standard for exposing Prometheus metrics.

Don't worry about memorising the format.

Most of the time you will use one of the Prometheus client libraries.

Consuming application metrics in Kubernetes

You're almost ready to autoscale; you should install the metrics server first.

In fact, Kubernetes will not ingest metrics from your application by default.

You should enable the Custom Metrics API if you wish to do so.

To install the Custom Metrics API, you also need Prometheus — a time series database.

All the files needed to install the Custom Metrics API are conveniently packaged in learnk8s/spring-boot-k8s-hpa.

You should download the content of that repository and change the current directory to be in the monitoring folder of that project.

bash

cd spring-boot-k8s-hpa/monitoring

From there you can create the Custom Metrics API with:

bash

kubectl create -f ./metrics-server
kubectl create -f ./namespaces.yaml
kubectl create -f ./prometheus
kubectl create -f ./custom-metrics-api

You should wait until the following command returns a list of custom metrics:

bash

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

Mission accomplished!

You're ready to consume metrics.

In fact, you should already find a custom metric for the number of messages in the queue:

bash

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/messages" | jq .

Congratulations, you have an application exposing metrics and a metric server consuming them.

You can finally enable the autoscaler!

Autoscaling deployments in Kubernetes

Kubernetes has an object called Horizontal Pod Autoscaler that is used to monitor deployments and scale the number of Pods up and down.

You will need one of those to scale your instances automatically.

You should create a hpa.yaml file with the following content:

hpa.yaml

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: spring-boot-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metricName: messages
      targetAverageValue: 10

The file is cryptic, let me translate it for you:

Kubernetes watches the deployment specified in scaleTargetRef. In this case, it's the worker.
You're using the messages metric to scale your Pods. Kubernetes will trigger the autoscaling when there're more than ten messages in the queue.
As a minimum, the deployment should have two Pods. Ten Pods is the upper limit.

You can create the resource with:

bash

kubectl create -f hpa.yaml

After you submitted the autoscaler, you should notice that the number of replicas for the backend is two:

bash

kubectl get pods

It makes sense since you asked the autoscaler always to have at least two replicas running.

You can inspect the conditions that triggered the autoscaler and the events generated as a consequence with:

bash

kubectl describe hpa

The autoscaler suggests it was able to scale the Pods to 2 and it's ready to monitor the deployment.

Exciting stuff, but does it work?

Load testing

There's only one way to know if it works: creating loads of messages in the queue.

Head over to the front-end application and start adding a lot of messages.

As you add messages, monitor the status of the Horizontal Pod Autoscaler with:

bash

kubectl describe hpa

The number of Pods goes up from 2 to 4, then 8 and finally 10.

The application scales with the number of messages!

Hurrah!

You just deployed a fully scalable application that scales based on the number of pending messages on a queue.

On a side note, the algorithm for scaling is the following:

MAX(CURRENT_REPLICAS_LENGTH * 2, 4)

The documentation doesn't help a lot when it comes to explaining the algorithm. You can find the details in the code.

Also, every scale-up is re-evaluated every minute, whereas any scale down every two minutes.

All of the above are settings that can be tuned.

You're not done yet, though.

What's better than autoscaling instances? Autoscaling clusters

Scaling Pods across nodes works fabulously.

But what if you don't have enough capacity in the cluster to scale your Pods?

If you reach peak capacity in the cluster, Kubernetes will leave the Pods in a pending state and wait for more resources to be available.

It would be great if you could use an autoscaler similar to the Horizontal Pod Autoscaler, but for Nodes.

Good news!

You can have a cluster autoscaler that adds more nodes to your Kubernetes cluster as you need more resources.

The cluster autoscaler comes in different shapes and sizes.

And it's also cloud provider specific.

Please note that you won't be able to test the autoscaler with minikube since it is single node by definition.

You can find more information about the cluster autoscaler and the cloud provider implementation on Github.

Summary

Designing applications at scale require careful planning and testing.

Queue based architecture is an excellent design pattern to decouple your microservices and ensure they can be scaled and deployed independently.

And while you can roll out your deployment scripts, it's easier to leverage a container orchestrator such as Kubernetes to deploy and scale your applications automatically.

Thanks to Nathan Cashmore, and Andy Griffiths for their feedback!

That's all folks!

If you enjoyed this article, you might find the following articles interesting:

3 simple tricks for smaller Docker images and learn how to build and deploy Docker images quicker.
Kubernetes Chaos Engineering: Lessons Learned — Part 1 what happens when things go wrong in Kubernetes? Can Kubernetes recover from failure and self-heal?