What is GitOps and why it is important?
This is the 2nd blog in Chris Jones’s series on scaling Kubernetes for applications, GitOps, and increasing developer productivity. You can read part 1 here. Part 3 and part 4 of the series are also available.
GitOps is the best practice for operating cloud-native at scale. Almost as soon as you get your first Kubernetes cluster running, you stumble upon a problem: How do keep every deployed application consistent with the required operation state? The general process that users evolve through is to first limit access to all clusters, use Kubectl only with associated change requests, and eventually package applications in Helm charts. Many users progress to using Kustomize, applying environment-specific customizations. Then, with five clusters running, 100 happy developers, and production running smoothly, an application update brings the whole system down. The cause? A change made in development to the applications configuration (the manifest) that wasn’t captured in the Helm chart, and due to missing tests, not hit until live.
GitOps helps mitigate outages by enforcing strict “as-code” principles to the objects that are deployed to a Kubernetes cluster (manifest, helm charts and policies) and automates the deployment. GitOps relies on the fact that Kubernetes is a declarative system and externalizes the system’s source of truth to Git. By codifying and automating lifecycle, GitOps removes manual deployment steps and reduces the likelihood of changes being missed.
What is Declarative Management
Declarative Management is an operational paradigm whereby a system operates to a state that is defined/declared and continuously attempts to maintain that declared state. The user is not involved nor concerned about how the state is achieved, just that the system obtains the desired state. The objects that make up an operational system should be defined as code that is human-readable. The subsequent system is built from the sum of the objects and directly reflects the object’s defined state. The defined state for an object is the source of truth, not the object itself. If the source of truth changes, the related environment changes. Change the object directly and the system reverts the change to stay consistent with the declared state.
What is GitOps?
GitOps is an industry term used to describe the practice of managing objects, such as an application or policy, from human readable code stored in a Git repository (GitHub, GitLab, Bit Bucket ect). It is an evolution in the practice of operating a declarative system. Most commonly, a system that is running declaratively holds the state that it must maintain within itself and the objects that define it are the definitive source of truth.
Utilizing a running system as the source of truth can prove contentious, especially when attempts are made to validate the current state as accurate. Running a declarative system does little to reduce the likelihood of someone asking, “But is the system running as we need it to?” and discovering it is, in fact, not. If the system is the source of truth, then a user has little choice but to accept its state as true and accurate, when the opposite may be true. GitOps externalizes the source of truth, extracting it from the system, storing it as code in an accepted repository – a Git Repository.
This solves (and, indeed, removes) entirely the question, “But is the system running as we need it to?”. The system is either identical to what is in the Git Repository, or not. The system operates declaratively, it holds the state it must operate from, but the source of truth is external and easily queried. Moving the source of truth solves one significant issue, and simultaneously introduces as second: synchronization.
To resolve the synchronization issue, GitOps mandates that the system is continuously evaluated against the external source of truth and its actual running state. If ‘drift’, or a difference, is found, then the running declared state is updated to reflect that of which is found in the external source of truth.
What is Drift?
Drift is variance between the object running in a live system and the source of truth that defines the object in Git. It could be as simple as a label being changed, or as complicated as a whole additional set of role-based access controls.
Drift occurs when a managed object is no longer identical to the source. Drift shouldn’t happen in a 100% GitOps environment, as all changes must be implemented to the source and synchronized to the system.
However, changes need to be developed and tested in lower environments, creating a scenario where an object may be managed by Git but intentionally have drift. Additionally, environments where changes are encouraged, such as a developer’s own cluster, maybe a crucial part of the engineering process. The developer needs to be able to make changes so they can work independently without hindrance, but an administrator may also want to see how the developer’s environment is different to a definition of the system stored in Git.
GitOps and Declarative Management for Kubernetes
In practice, and specifically for Kubernetes, a declarative, GitOps practice involves three distinct elements:
- Repository: The repository holds the source of truth of the system. This could be the system as a whole, or part thereof. Include elements such as policies that define how the cluster should be running, tools that extend how the cluster operates, and end-user/business applications.
- Kubernetes Cluster: This is the core of the system. (Kubernetes natively operates as a declarative platform, although a user can change the system through an imperative action – “change ‘x’ to ‘y’”).
- Continuous Delivery Mechanism: This is the tool that ensures that the system maintains an up-to-date copy of the source of truth from the Git Repository. And is more than likely responsible for assessing the comparative state (Git vs K8s Cluster), and when drift is found, invoking actions to reconcile the issue.
A typical user workflow includes:
- Creating a repository within which all objects will be stored.
- Defining the structure within the repository to ensure all operational objectives are met. The more objects that are to be defined and the total number of systems that need to be operated create inflection points that need to be planned for. Further, a multi-tenant environment will introduce additional complexities; so, too, will designing for user self-service.
- Creating the objects in a git repository. This act in itself is multiple steps and, as a best practice, should involve a review of the objects by a second party.
- Kubernetes Cluster Creation. (Here in lies a question, opportunity, and problem. Can a cluster itself be defined as code, and be created to operate declaratively via GitOps?)
- Deployment of the Continuous Delivery Mechanism. The best practice is to leverage a tool that is in itself able to be deployed utilizing a state declared and stored in the repository and deploy itself through GitOps; inception.
- System deployment happens after the Continuous Delivery Mechanism is deployed and running. The Continuous Delivery Mechanism is configured to deploy one or more of the objects within the repository where the structure of the repository has a non-trivial effect on what is deployed and how it runs. Typically, the tool will clone the objects from the repository and then take actions to create the system.
- Continuous delivery. This this the final and ongoing action of the Continuous Delivery Mechanism.
An entire system or object is deployed and references the code in Git as the source of truth as to how it should operate. To make changes to the environment or object, the user will update the code stored in Git, and then these changes are then applied to the environment or application by the Continuous Delivery Mechanism.
It is critical to understand that the process of detecting a change to the object in the Git Repository and then applying the change to the running environment/application is executed by the Continuous Delivery Mechanism such that its automated and no human intervention is required.
Reducing, or completely removing human error is at the heart of GitOps. The act of storing more configuration and metadata as code within a Git Repository means that it is versioned, reviewed and auditable. For the related object running within the system to change, the code in the Git Repository must go through a review process and a final step of merging the code so that the source of truth is updated. This process is out of the scope of Arlon, but it is critical to understand that GitOps is predicated on these steps and the related process.
Users are encouraged to invest in a process that, as a best practice, implements changes into lower environments such as development or test, and only after validation and approval promotes the change into production.
Basic GitOps Flow
- FinOps: Applying Earned Value Management to maximize ROI - June 18, 2024
- Top 6 FinOps KPIs for EKS - June 17, 2024
- The argument for AWS Spot Instances - May 8, 2024