Learn Kubernetes Weekly issue 132
21 May 2025
This newsletter is brought to you by Dagger — build software engineering workflows and environments with Dagger.
An In-Depth Analysis of the OpenAI’s Incident and Mitigation Strategies
xiaoqing
This post analyzes the root cause and mitigation strategies of OpenAI’s Dec 2024 outage, where a misconfigured telemetry service triggered API overload across clusters, crashing Kubernetes control planes and disrupting DNS.
Agents in your software factory
How to build software engineering workflows — like code reviews and builds — with LLMs inside.
sponsored
Taming the wild west of research computing: how policies saved us a thousand headaches
Alessandro Pomponio
By leveraging Kyverno, Kueue, and Argo CD, IBM Research transformed chaotic GPU resource sharing into a policy-driven, fair computing environment—solving GPU hogging, scheduling conflicts, and administrative overhead in research computing.
This case study highlights the challenges of unpredictable resource usage, complex networking, and workload isolation in multi-tenant Kubernetes platforms.
Resource management in Kubernetes
Matthieu Treussart
This guide shows how to right-size Kubernetes pod resources (CPU, memory, ephemeral storage) using real Prometheus metrics, Go runtime env tuning (GOMAXPROCS, GOMEMLIMIT), and node-level capacity planning.
Reducing Pod Startup Time for Java Application on EKS
Balu
This article explains how to reduce pod cold-start time for Java apps on EKS.
It covers in-place JVM boot optimization, image prefetching via AWS EventBridge+SSM, and paused low-priority pods to keep nodes warm before real autoscale events.
Build your modern software factory
Define software delivery workflows and dev environments with reusable components — including LLMs — and run them anywhere.
Built by the creators of Docker.
Michal Pitr
This article guides you through using terminal commands to build a Linux container from the ground up.
Mastering Compute Efficiency: Dynamic GPU Partitioning Strategies for Kubernetes-Based ML Systems
Yash Mehra
This article explores three GPU sharing techniques—Time Slicing, Multi-Instance GPU (MIG), and Multi-Process Service (MPS)—to enhance GPU utilization in Kubernetes-managed machine learning workloads.
Standardizing App Delivery with Flux and Generic Helm Charts
Stefan Prodan
This tutorial explains how Flux and Generic Helm Charts standardize Kubernetes app delivery using reusable tech-specific charts, automated OCI deployments, and Kustomize for environment customization.
flux2-multi-tenancy: Automated Tenant Onboarding with Flux and Kyverno
fluxcd
flux2-multi-tenancy provides GitOps templates and Kyverno policies to automate tenant onboarding.
It provisions namespaces, RBAC, and policy controls in Kubernetes using pull requests, enabling secure multi-tenant cluster management from Git.
Rewriting Docker image registries with Kyverno
Oleksandr Ponomarov
This article shows how to use Kyverno policies and Helm to rewrite container image registry URLs at admission for all pod container types.
This image mutation enables namespace-controlled migration to new registries without editing every manifest.
Site Reliability Engineer with CoW DAO
Salary: €90K to €120K a year
Location: remote from Europe
Tech stack: Kubernetes, AWS, Flux, Docker, Go, Python, Rust, PostgreSQL, Elastic Search, Pulumi
Data Engineer with Chartbeat
Salary: $128K to $147K a year
Location: remote from the United States
Tech stack: Kubernetes, Python, PostgreSQL, Snowflake, Kafka
Software Engineer with Crusoe
Salary: $245K to $290K a year
Location: based in the office (and remote from home) in San Francisco, CA, USA
Tech stack: Kubernetes, Go, Java, Rust, C++, C, Ceph, Terraform, Ansible, Puppet
Platform Engineer with Lyft
Salary: CA$108K to CA$135K a year
Location: based in the office (and remote from home) in Toronto, ON, CA
Tech stack: Kubernetes, AWS, Docker, Go, Python, Kafka, Terraform, Cloudformation, Ansible, Puppet
Test Automation Engineer with Palo Alto Networks
Salary: $104K to $185.5K a year
Location: based in the office in Santa Clara, CA, USA
Tech stack: Kubernetes, AWS, Azure, GCP, Docker, Python, Javascript, Gitlab
Discover more Kubernetes jobs on Kube Careers →
Coroot: Observability Platform
Babacar Mbaye Faye
Coroot is an eBPF-powered observability tool that maps service dependencies, request paths, errors, and latency in real time without code changes or sidecars.
Dagger: runtime for composable workflows
Dagger is an open-source runtime for composable workflows.
It's perfect for systems with many moving parts and a strong need for repeatability, modularity, observability and cross-platform support.
sponsored
Khronoscope: Time Travel for Troubleshooting and Debugging
hoyle1974
Khronoscope snapshots your cluster's resource states in-memory and lets you inspect changes over time with VCR-like controls.
Without persistent storage or agent overhead, you can view logs, rewind crashes, and trace dependencies across namespaces.
Kilo: WireGaurd network overlay
Kilo is a multi-cloud network overlay built on WireGuard and designed for Kubernetes.
Kraken is a P2P-powered Docker registry that focuses on scalability and availability.
It is designed for Docker image management, replication, and distribution in a hybrid cloud environment.
May
21
Cloud Native, the Hard Way: Mistakes from Our VM to Kubernetes Journey
In-person meetup organized by Cloud Native Vilnius.
Location: Vilnius, LT
This is a free event.
May
22
Kubernetes Community Days Seoul 2025
In-person conference organized by KCD South Korea.
Location: Seoul, KR
This event requires an entrance fee
May
22
On-Prem Kubernetes at Scale with metal-stack.io & AI Workloads on Kubernetes
Online & in-person meetup organized by Cloud Native Night Munich.
Location: München, DE and virtual
This is a free event.
May
23
Kubernetes Community Days Istanbul 2025
In-person conference organized by KCD Istanbul.
Location: İstanbul, TR
This event requires an entrance fee
May
24
Mission O11y Possible: Panic at the Pod
In-person meetup organized by Cloud Native Noida.
Location: Noida, IN
This is a free event.
Jun
26
Online workshop organized by Learnk8s.
This is a virtual event
This event requires an entrance fee
Discover more Kubernetes events on Kube Events →
expired
Kubernetes Community Washington DC 2025
Location: Washington, D.C., USA
In-person conference organized by KCD Washington DC.
The conference starts on the 16 September 2025.
1
days
Location: Vienna, AT
In-person conference organized by CNDA Austria.
The conference starts on the 8 October 2025.
expired
KubeCon + CloudNativeCon North America 2025
Location: Atlanta, GE, USA
In-person conference organized by Linux Foundation.
The conference starts on the 10 November 2025.
16
days
Location: Aarhus, DK
In-person conference organized by CND.
The conference starts on the 17 April 2025.
31
days
Kubernetes Community Days Porto 2025
Location: Porto, PT
In-person conference organized by KCD Porto.
The conference starts on the 4 November 2025.
16
days
Kubernetes Community Days Warsaw 2025
Location: Warsaw, PL
In-person conference organized by KCD Warsaw.
The conference starts on the 9 October 2025.
65
days
Location: Austin, TX, USA
In-person conference organized by TXLF.
The conference starts on the 4 October 2025.
16
days
Location: Tel Aviv, IL
In-person conference organized by Devopsdays.
The conference starts on the 11 December 2025.
66
days
Location: Tokyo, JP
In-person conference organized by Linux Foundation.
The conference starts on the 10 December 2025.
Until next time!
— Dan
Subscribe and, every Wednesday, receive the latest Kubernetes news!