Learn Kubernetes Weekly issue 132

OpenAI's Incident and Mitigation, policies saved us a thousand headaches, We're leaving Kubernetes Reducing Pod Startup Time for Java

21 May 2025

This newsletter is brought to you by Dagger — build software engineering workflows and environments with Dagger.

  1. An In-Depth Analysis of the OpenAI’s Incident and Mitigation Strategies

    xiaoqing

    This post analyzes the root cause and mitigation strategies of OpenAI’s Dec 2024 outage, where a misconfigured telemetry service triggered API overload across clusters, crashing Kubernetes control planes and disrupting DNS.

  2. Agents in your software factory

    How to build software engineering workflows — like code reviews and builds — with LLMs inside.

    sponsored

  3. Taming the wild west of research computing: how policies saved us a thousand headaches

    Alessandro Pomponio

    By leveraging Kyverno, Kueue, and Argo CD, IBM Research transformed chaotic GPU resource sharing into a policy-driven, fair computing environment—solving GPU hogging, scheduling conflicts, and administrative overhead in research computing.

  4. We're leaving Kubernetes

    This case study highlights the challenges of unpredictable resource usage, complex networking, and workload isolation in multi-tenant Kubernetes platforms.

  5. Resource management in Kubernetes

    Matthieu Treussart

    This guide shows how to right-size Kubernetes pod resources (CPU, memory, ephemeral storage) using real Prometheus metrics, Go runtime env tuning (GOMAXPROCS, GOMEMLIMIT), and node-level capacity planning.

  6. Reducing Pod Startup Time for Java Application on EKS

    Balu

    This article explains how to reduce pod cold-start time for Java apps on EKS.

    It covers in-place JVM boot optimization, image prefetching via AWS EventBridge+SSM, and paused low-priority pods to keep nodes warm before real autoscale events.

Articles worth checking out:

Build your modern software factory

Define software delivery workflows and dev environments with reusable components — including LLMs — and run them anywhere.

Built by the creators of Docker.

Learn more

Build your modern software factory
  1. Linux container from Scratch

    Michal Pitr

    This article guides you through using terminal commands to build a Linux container from the ground up.

  2. Mastering Compute Efficiency: Dynamic GPU Partitioning Strategies for Kubernetes-Based ML Systems

    Yash Mehra

    This article explores three GPU sharing techniques—Time Slicing, Multi-Instance GPU (MIG), and Multi-Process Service (MPS)—to enhance GPU utilization in Kubernetes-managed machine learning workloads.

  3. Standardizing App Delivery with Flux and Generic Helm Charts

    Stefan Prodan

    This tutorial explains how Flux and Generic Helm Charts standardize Kubernetes app delivery using reusable tech-specific charts, automated OCI deployments, and Kustomize for environment customization.

  4. flux2-multi-tenancy: Automated Tenant Onboarding with Flux and Kyverno

    fluxcd

    flux2-multi-tenancy provides GitOps templates and Kyverno policies to automate tenant onboarding.

    It provisions namespaces, RBAC, and policy controls in Kubernetes using pull requests, enabling secure multi-tenant cluster management from Git.

  5. Rewriting Docker image registries with Kyverno

    Oleksandr Ponomarov

    This article shows how to use Kyverno policies and Helm to rewrite container image registry URLs at admission for all pod container types.

    This image mutation enables namespace-controlled migration to new registries without editing every manifest.

More tutorials:

    • Site Reliability Engineer with CoW DAO

    • Salary: €90K to €120K a year

    • Location: remote from Europe

    • Tech stack: Kubernetes, AWS, Flux, Docker, Go, Python, Rust, PostgreSQL, Elastic Search, Pulumi

    • Data Engineer with Chartbeat

    • Salary: $128K to $147K a year

    • Location: remote from the United States

    • Tech stack: Kubernetes, Python, PostgreSQL, Snowflake, Kafka

    • Software Engineer with Crusoe

    • Salary: $245K to $290K a year

    • Location: based in the office (and remote from home) in San Francisco, CA, USA

    • Tech stack: Kubernetes, Go, Java, Rust, C++, C, Ceph, Terraform, Ansible, Puppet

    • Platform Engineer with Lyft

    • Salary: CA$108K to CA$135K a year

    • Location: based in the office (and remote from home) in Toronto, ON, CA

    • Tech stack: Kubernetes, AWS, Docker, Go, Python, Kafka, Terraform, Cloudformation, Ansible, Puppet

    • Test Automation Engineer with Palo Alto Networks

    • Salary: $104K to $185.5K a year

    • Location: based in the office in Santa Clara, CA, USA

    • Tech stack: Kubernetes, AWS, Azure, GCP, Docker, Python, Javascript, Gitlab

Discover more Kubernetes jobs on Kube Careers →

  1. Coroot: Observability Platform

    Babacar Mbaye Faye

    Coroot is an eBPF-powered observability tool that maps service dependencies, request paths, errors, and latency in real time without code changes or sidecars.

  2. Dagger: runtime for composable workflows

    Dagger is an open-source runtime for composable workflows.

    It's perfect for systems with many moving parts and a strong need for repeatability, modularity, observability and cross-platform support.

    sponsored

  3. Khronoscope: Time Travel for Troubleshooting and Debugging

    hoyle1974

    Khronoscope snapshots your cluster's resource states in-memory and lets you inspect changes over time with VCR-like controls.

    Without persistent storage or agent overhead, you can view logs, rewind crashes, and trace dependencies across namespaces.

  4. Kilo: WireGaurd network overlay

    Kilo is a multi-cloud network overlay built on WireGuard and designed for Kubernetes.

  5. Kraken registry

    Kraken is a P2P-powered Docker registry that focuses on scalability and availability.

    It is designed for Docker image management, replication, and distribution in a hybrid cloud environment.

Other interesting projects:

Upcoming Kubernetes events

  1. May

    21

    Cloud Native, the Hard Way: Mistakes from Our VM to Kubernetes Journey

    In-person meetup organized by Cloud Native Vilnius.

    • Location: Vilnius, LT

    • This is a free event.

  2. May

    22

    Kubernetes Community Days Seoul 2025

    In-person conference organized by KCD South Korea.

    • Location: Seoul, KR

    • This event requires an entrance fee

  3. May

    22

    On-Prem Kubernetes at Scale with metal-stack.io & AI Workloads on Kubernetes

    Online & in-person meetup organized by Cloud Native Night Munich.

    • Location: München, DE and virtual

    • This is a free event.

  4. May

    23

    Kubernetes Community Days Istanbul 2025

    In-person conference organized by KCD Istanbul.

    • Location: İstanbul, TR

    • This event requires an entrance fee

  5. May

    24

    Mission O11y Possible: Panic at the Pod

    In-person meetup organized by Cloud Native Noida.

    • Location: Noida, IN

    • This is a free event.

  6. Jun

    26

    Advanced Kubernetes course

    Online workshop organized by Learnk8s.

    • This is a virtual event

    • This event requires an entrance fee

Discover more Kubernetes events on Kube Events →

Kubernetes Call for Papers

  1. expired

    Kubernetes Community Washington DC 2025

    The Call For Paper was open until 26 May 2025 at UTC. More info →
    • Location: Washington, D.C., USA

    • In-person conference organized by KCD Washington DC.

    • The conference starts on the 16 September 2025.

    • Apply here
  2. 1

    days

    Cloud Native Days Austria

    The Call For Paper is open until 31 May 2025 at UTC. More info →
    • Location: Vienna, AT

    • In-person conference organized by CNDA Austria.

    • The conference starts on the 8 October 2025.

    • Apply here
  3. expired

    KubeCon + CloudNativeCon North America 2025

    The Call For Paper was open until 28 May 2025 at UTC. More info →
    • Location: Atlanta, GE, USA

    • In-person conference organized by Linux Foundation.

    • The conference starts on the 10 November 2025.

    • Apply here
  4. 16

    days

    Cloud Native Denmark 2025

    The Call For Paper is open until 16 June 2025 at UTC. More info →
    • Location: Aarhus, DK

    • In-person conference organized by CND.

    • The conference starts on the 17 April 2025.

    • Apply here
  5. 31

    days

    Kubernetes Community Days Porto 2025

    The Call For Paper is open until 30 June 2025 at UTC. More info →
    • Location: Porto, PT

    • In-person conference organized by KCD Porto.

    • The conference starts on the 4 November 2025.

    • Apply here
  6. 16

    days

    Kubernetes Community Days Warsaw 2025

    The Call For Paper is open until 16 June 2025 at UTC. More info →
    • Location: Warsaw, PL

    • In-person conference organized by KCD Warsaw.

    • The conference starts on the 9 October 2025.

    • Apply here
  7. 65

    days

    Texas Linux Festival 2025

    The Call For Paper is open until 3 August 2025 at UTC. More info →
    • Location: Austin, TX, USA

    • In-person conference organized by TXLF.

    • The conference starts on the 4 October 2025.

    • Apply here
  8. 16

    days

    Devopsdays Tel Aviv

    The Call For Paper is open until 15 June 2025 at UTC. More info →
    • Location: Tel Aviv, IL

    • In-person conference organized by Devopsdays.

    • The conference starts on the 11 December 2025.

    • Apply here
  9. 66

    days

    Open Source Summit Japan 2025

    The Call For Paper is open until 4 August 2025 at UTC. More info →
    • Location: Tokyo, JP

    • In-person conference organized by Linux Foundation.

    • The conference starts on the 10 December 2025.

    • Apply here

Until next time!

— Dan

Subscribe and, every Wednesday, receive the latest Kubernetes news!

Or follow us on: