Put your Kubernetes resources to sleep during off-hours using Keda

6 min readJul 15, 2023

In today’s digital landscape, organizations rely heavily on container orchestration platforms like Kubernetes to manage their workloads efficiently. Kubernetes enables scalability, flexibility, and high availability for applications. However, there are instances when it may be beneficial for organizations to power off their Kubernetes workloads during off hours. That reason could be mainly cost optimization, security or maintenance.

The off-hours challenge can be solved from multiple perspectives, but the two main ones that come to mind are:

Powering off the Kubernetes nodes
Powering off the Kubernetes applications

The former (1) might be a good solution for some, because it simply makes sure that the nodes are powered off. Essentially, nothing can be more cost effective and secure than that. However, this solution takes away the dynamic access and applicative nature of Kubernetes workloads.

In this article, I will explore two ways to execute the latter (2).

The How

KEDA is a Kubernetes-based Event Driven Autoscaler. With KEDA, you can drive the scaling of any container in Kubernetes based on the number of events needing to be processed.

The “Event” in this case, is either “time” itself, or a sleep request. Keda augments the functionality of the native Kubernetes Horizontal Pod Autoscaler (HPA) by managing it. But as you might be aware, HPA cannot scale workloads to 0. Keda, however, can! By deleting the HPA completely and recreating it if necessary.

In order for Keda to scale your workloads to 0 during off-hours, wer’e going to explore two methodologies. The first, is simply time based, and the second allows you to control the sleep schedule of your precious workloads with more granularity.

The Simple Solution — Cron Scaler

The main Keda CRD is called ScaledObject. The ScaledObject defined how many replicas a certain workload (Deployment, StatefulSet, etc) has at a specific time. As of writing this article, The Cron Scaler currently has issues when specifying what off-hours are. But, it can dictate what the on-hours are, so that your workloads know when to be awake, and otherwise — sleep.

Let’s assume you have a simple deployment, and you kubectl apply -f deployment.yaml it:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sleepy-workload
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleepy-workload
  template:
    metadata:
      labels:
        app: sleepy-workload
    spec:
      containers:
        - name: busybox
          image: busybox
          command:
            - sleep
            - "3600"

Now, let’s add a ScaledObject Keda CRD and attach it to our sleep workload (your timezone can be found here). Let’s assume you want this workload alive from 9–5 on New York time.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sleepy-workload-scaler
spec:
  scaleTargetRef:
    # Mandatory. Must be in the same namespace as the ScaledObject
    name: sleepy-workload
  # Polling interval of when to check scaling events
  pollingInterval: 10
  # Makes the workload immediately sleep when told, instead of reading a bed-time story before for X seconds
  cooldownPeriod: 0
  # The workload is ASLEEP by DEFAULT, otherwise, it's Awake.
  minReplicaCount: 0 
  triggers:
    - type: cron
      metadata:
        # The acceptable values would be a value from the IANA Time Zone Database.
        timezone: America/New_York  
        # At 09:00 on every day-of-week from Monday through Friday
        start: 0 9 * * 1-5
        # At 17:00 on every day-of-week from Monday through Friday 
        end: 0 17 * * 1-5
        # ie. Your MINIMUM replica count for this workload
        desiredReplicas: "2"

When you apply this into your Namespace, you’ll see how your sleep-workload sleeps and wakes up on a schedule. This solution is simple and elegant.

(Keda calculates the replica count of all triggers based on a MAX function. If you add another scale to the triggers array, your CRON will be used as the minimum replica count, and any addition could surpass it)

The Extensive Solution — Custom Metrics API

What if you wanted an external system to dictate when and how workloads sleep, and you wanted your workloads to be aware of such state and act accordingly. That is where Metrics API Scaler comes into play.

This scaler allows you to define an external endpoint for keda to query in order to understand what is the amount of replicas it should have.

In this use case, I have used AWS DynamoDB, AWS Lambda and Jenkins as an automation server and cron scheduler. You could however, switch in-place any of these technologies with any other Database+API Server+Automation Scheduler you choose.

The Cron scaler is not part of this solution. For this one, we implement our business logic behind a custom API endpoint, pointed at by the Keda Metrics Scaler:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sleepy-workload-scaler
spec:
  scaleTargetRef:
      # Mandatory. Must be in the same namespace as the ScaledObject
      name: sleep-workload
    # Polling interval of when to check scaling events
    pollingInterval: 10
    # Makes the workload immediately sleep when told, instead of reading a bed-time story before for X seconds
    cooldownPeriod: 0
    # The workload is ASLEEP by DEFAULT, otherwise, it's Awake.
    minReplicaCount: 0
  triggers:
    - type: metrics-api
      metricType: Value
      metadata:
        # Your on-hours minimum replica count. Any value less than that returned
        # By the API will be ignored.
        targetValue: "2" 
        # Your custom API endpoint
        url: "https://off-hours-api.my.domain/sleep?workload=sleepy-workload&replicas=2"
        # What key contains the information on the replica count in the JSON response
        valueLocation: "replicaCount"

In this example, our sleepy-workload scaler points at an external URL, and provides it with two Query Parameters:

Workload = The workload’s name (you could switch Workload with Namepsace to turn off an entire namespace, having all your workloads point to the same endpoint with the same query, which is their common Namespace).
Replicas = If I am not asleep, how many replicas should I have?

The metrics scaler then expects the following JSON response from the GET request if the workload is Awake:

{
  "replicaCount": 2
}

And if the workload is considered Asleep

{
  "replicaCount": 0
}

Let’s assume we have the following DynamoDB table called state-of-my-workloads:

We now have a pseudo-code that allows your workload to be awoken and put to sleep. Let’s write a simple Python Lambda that answers the precise question Keda is asking. You could implement and deploy any API server your’e familiar with that is suited to your technology stack and use case:

def lambda_handler(event, context):
    print("Event Is: " + str(event))
    if event.get("httpMethod") == "GET":
        # Retrieve query parameters from the event
        query_params = event.get("queryStringParameters")

        if event.get("resource") == "/sleep":
            workload = query_params.get("workload", "Unknown")
            replicas = query_params.get("replicas", 0)

            res = {
                "statusCode": 200,
                "headers": {"Content-Type": "application/json"},
                "body": json.dumps(
                    {"replicaCount": (0 if get_sleep_value(workload) else replicas)}
                ),
            }
            return res

The get_sleep_value() can be implemented in the following way, if let’s say, you keep the state of your workloads/namespaces in AWS DynamoDB:

def get_sleep_value(workload):
    dynamodb = boto3.resource("dynamodb")
    table = dynamodb.Table("state-of-my-workloads")
    response = table.get_item(Key={"workload": workload})
    item = response.get("Item")
    print(item)
    if item is not None:
        sleep_value = item.get("sleep")
        if sleep_value is not None:
            return sleep_value
    return False

In this function, we return the Boolean value of the sleep status of your workload. In return, the main function returns the amount of replicas provided by the replicas query parameter.

Now, you could use any automation solution to switch your Boolean sleep value in your workload’s state table, or connect it to a manual portal in which people can turn their workloads on and off on-demand.

Hope you liked this solution! Leave a comment if you have any questions.

Put your Kubernetes resources to sleep during off-hours using Keda

The How

The Simple Solution — Cron Scaler

Written by Element