Necessary Culture Change with GitOps

Don’t underestimate the Role of Culture in Successful GitOps Implementation

Artem Lajko
ITNEXT

--

Note: This blog post delves into just a fragment of my professional realm, highlighting the dynamics between GitOps and Kubernetes. I frequently observe a neglect of culture and an oversight of potential within this scope. The focus here is on the most minimal aspect possible.

Introduction

When the concept of DevOps emerged, the spotlight was on the significant shift in culture, with the tools taking a backseat. However, with GitOps, it seems the narrative has flipped; the focus is predominantly on the tools rather than the cultural transformation and the barriers GitOps helps to overcome.

Let’s examine the following picture:

To be fair, this objective can also be realized through CI/CD pipelines; Argo CD isn’t a necessity at this juncture.

Now, let’s turn our attention to another picture:

As we scale, maintaining and synchronizing operations with CI/CD becomes challenging.

Before we dive deeper, let’s clarify some aspects that GitOps changes. I aim to share insights and hope to convey the perspective I’ve gained from working with GitOps and various teams and how the roles can be redefined.

Principles of Immutable Infrastructure

Note: The insights shared here reflect my personal perspective on establishing an effective immutable infrastructure setup.

Immutable infrastructure is a setup where once servers are deployed, they’re never updated; instead, new instances replace them if changes are needed. We already use this approach in Kubernetes deployments, where we replace the pods, or companies such as AWS use this approach to replace VMs and obtain config via Cloud Init, etc. This contrasts with the traditional approach of continually modifying servers, offering benefits like:

  • Consistency and Reliability: Servers remain in a known state, reducing unexpected issues.
  • Security: Unchanged post-deployment servers minimize the attack surface.
  • Simplified Management: Predictable infrastructure state eases troubleshooting.
  • Scalability: Efficiently manages demand with consistent performance.

For a productive environment, particularly with Kubernetes, consider these practices:

  • Read-Only Kubernetes Cluster Access: Treat the cluster as a managed service, restricting access too read-only to prevent manual changes.
  • GitOps for Resource Management: Manage all resources, from creation to deletion, through GitOps for traceability and consistency.
  • Namespace Management: Avoid letting applications create namespaces to prevent conflicts and management issues.
  • Utilize Tools like PR-Generator: Enhance testing and deployment in a GitOps workflow, ensuring resources are appropriately managed.

These practices ensure centralized, version-controlled management, traceable changes, clean resource management, and a standardized workflow across projects.

The Overcoming Culture Barriers Part

Note: Regrettably, Weaveworks has removed their blogs, rendering my previous links inactive, which I find quite disappointing. It appears that Weaveworks is now shutting down :(

Weaveworks emphasizes that GitOps, especially through Weave GitOps, promotes a strong team culture with clear communication, coordination, and collaboration. GitOps clarifies roles and responsibilities, allowing team autonomy while using Kubernetes namespaces to manage workloads. Through GitOps, ensuring security admins can set and enforce policies. This operational culture of GitOps, emphasizing precision and accountability, differs from DevOps’ aspirational goals. The GitOps model encourages teams to use the same tools for varied use cases, fostering autonomy but demanding excellent inter-team communication and defined responsibilities. This approach’s effectiveness in real projects showcases the practical application of GitOps in fostering a collaborative culture. I will demonstrate this with an example from a combination of my projects.

New Opportunities Across Different Roles

Note: In the following discussion, I won’t cover aspects such as the importance of all teams understanding that Git is the source of truth, among others, as there are already other insightful articles on these topics.

Let’s take a closer look at the following image:

The GitOps concept allows for a scalable orchestration platform that can represent the diverse roles and their specific needs within an organization. The image shown provides a glimpse into the variety of roles possible.

Viewing this image as a representation of cross-project collaboration within a company, the Platform Team, for example, could leverage Argo CD to supply developers with necessary platform components like Ingress Controllers and Cert-Managers, ensuring easy maintenance.

Similarly, the Security Team could utilize tools such as Kyverno for deploying policies with audit or enforcement capabilities across clusters, thereby maximizing the company’s security efforts.

Developers, on their part, have the capability to deploy their applications across various clusters using the GitOps model, offering tailored solutions like user-defined portals to different customers.

With everyone working across the clusters, employing the GitOps methodology while pursuing distinct objectives, it becomes crucial to establish rules, foster communication, etc., to prevent any negative impact on each other’s work. The teams operate on a common ground.

Let’s examine an example for clearer insight

In the project, there’s a Platform team, several Developer teams, and a Security team. The Security team’s role is to ensure not only that security guidelines are adhered to within the organization, but also IT governance and compliance. Therefore, it feels responsible for using all possible means to protect the company and enforces policies throughout the GitOps process.

Security Team: enforces a policy that no application can run with elevated privileges, and some applications, both in the platform context and those self-developed, stop working. This results in application downtime and software failure.

Platform Team: carried out a Kubernetes upgrade from 1.24 to 1.25, as support for the previous version ended. As a result, the third-party tools used by the Security team no longer work. Some of the developers’ applications also stop functioning. The change did not consider that the upgrade replaced PodSecurityPolicies with PodSecurityStandards.

Developers: use their autonomy and open a NodePort on a node with an external IP address for testing purposes. The application becomes externally accessible. The debugging works, but testing takes longer. Fortunately, the application uses Log4j Version 2.10 for logging, simplifying the debugging process.

Change Consequences — Understanding the Impact on Business

The various actions can lead to several consequences, including customer frustration, financial losses for the company by the minute, and a tarnished company reputation, among others. While these scenarios are hypothetical and my projects were unaffected by the Log4J incident, the key takeaway is not the specific events, but the recognition that such incidents can happen and are rational from the team’s perspective.

Change Explained — Through Our Teams Eyes

Recent challenges highlight that the perspectives of the Security, Platform, and Developer teams, though differing, all aimed at serving the company’s best interest. The Security and Platform teams prioritized accountability, ensuring the company was adequately protected without directly linking their actions to potential service disruptions or customer issues. They acted responsibly, yet a gap in simple communication and transparency about their actions missed an opportunity to prevent misunderstandings.

On the other side, developers, driven by the urgency of meeting a deadline, took responsibility for advancing the project to mitigate delays and customer frustration. Their approach, focused on the company’s best interest, unfortunately bypassed the value of informing other teams. This omission overlooked the importance of collaboration, assuming that their independent action was the most efficient path forward.

Each team, in its way, demonstrated accountability and a commitment to keeping the company adequately protected and competitive. However, the challenges underscored the critical need for simple communication, transparency, and collaboration to align all actions with the broader objectives of the company, ensuring that every decision and action is clearly in the company’s best interest.

Wrapping Up

In response to these challenges, we established a guild that included representatives from each team, facilitating regular meetings to discuss upcoming changes and their potential impacts. This initiative improved communication and collaboration across teams, promoting a deeper understanding of different perspectives, needs, and justifications, thereby enhancing overall transparency.

Although not every decision underwent thorough discussion due to time constraints or urgent matters like critical security vulnerabilities, the approach led to significant improvements. Increased transparency allowed for a better grasp of change impacts, minimizing negative consequences. Furthermore, empathy among team members grew, leading to a noticeable decline in the blame culture. This shift towards more open and empathetic collaboration marked a positive change in how teams interacted and addressed challenges together.

Contact Information

If you have some Questions, would like to have a friendly chat or just network to not miss any topics, then don’t use the comment function at medium, just feel free to add me to your LinkedIn network!

--

--