Cloud-native applications are often architected as a constellation of distributed microservices, which are running in Containers. Increasingly, these containerized applications are Kubernetes-based, as it has become the de-facto standard for container orchestration.
One outcome that most companies using microservices architecture don’t fully understand the impact of until they are well down the path is microservices sprawl. Like the suburbs around a city, the number of small services that are deployed seems to expand exponentially.
This exponential growth in microservices creates challenges around figuring out how to enforce and standardize things like routing between multiple services/versions, authentication and authorization, encryption, and load balancing within a Kubernetes cluster.
Building on Service Mesh helps resolve some of these issues, and more. As containers abstract away the operating system from the application, Service Meshes abstract away how inter-process communications are handled.
What is Service Mesh
While Service Mesh technology has been around prior to Kubernetes, the proliferation of microservices that are built on Kubernetes has contributed to the growing interest in Service Mesh solutions.
The thing that is most crucial to understand about microservices is that they are heavily reliant on the network.
Service Mesh manages the network traffic between services. It does that in a much more graceful and scalable way compared to what would otherwise require a lot of manual, error-prone work and operational burden that is not sustainable in the long-run.
In general, service mesh layers on top of your Kubernetes infrastructure and is making communications between services over the network safe and reliable.
Think about service mesh like a routing and tracking service for a package shipped in the mail: it keeps track of the routing rules and dynamically directs the traffic and package route to accelerate delivery and ensure receipt.
Service mesh allows you to separate the business logic of the application from observability, and network and security policies. It allows you to connect, secure, and monitor your microservices.
- Connect: Service Mesh enables services to discover and talk to each other. It enables intelligent routing to control the flow of traffic and API calls between services/endpoints. These also enable advanced deployment strategies such as blue/green, canaries or rolling upgrades, and more.
- Secure: Service Mesh allows you secure communication between services. It can enforce policies to allow or deny communication. E.g. you can configure a policy to deny access to production services from a client service running in development environment.
- Monitor: Service Mesh enables observability of your distributed microservices system. Service Mesh often integrates out-of-the-box with monitoring and tracing tools (such as Prometheus and Jaeger in the case of Kubernetes) to allow you to discover and visualize dependencies between services, traffic flow, API latencies, and tracing.
These key capabilities provide operational control and observability into the behavior of the entire network of distributed microservices that make up a complex cloud-native application.
Service Mesh is critical when you’re dealing with web-scale or hyper-scale microservices workloads (think Netflix, Amazon, etc.). But, as we’ll see below, there’s plenty that you can already get out of service mesh now – while you’re still growing – as a framework to support massive scale in the future.
Read it later: A Practical Guide to Kubernetes Service Discovery
Service Mesh Options for Kubernetes:
There are three leading contenders in the Kubernetes ecosystem for Service Mesh. All of these solutions are open source. Each solution has its own benefits and downfalls, but using any of them will put your DevOps teams in a better position to thrive as they develop and maintain more and more microservices.
Consul is a full-feature service management framework, and the addition of Connect in v1.2 gives it service discovery capabilities which make it a full Service Mesh. Consul is part of HashiCorp’s suite of infrastructure management products; it started as a way to manage services running on Nomad and has grown to support multiple other data center and container management platforms including Kubernetes.
Consul Connect uses an agent installed on every node as a DaemonSet which communicates with the Envoy sidecar proxies that handles routing & forwarding of traffic.
Architecture diagrams and more product information is available at Consul.io.
Istio is a Kubernetes-native solution that was initially released by Lyft, and a large number of major technology companies have chosen to back it as their service mesh of choice. Google, IBM, and Microsoft rely on Istio as the default service mesh that is offered in their respective Kubernetes cloud services. A fully-managed service of Istio for hybrid environments will soon be available from Platform9 Managed Kubernetes service.
Istio was the first to include additional features that developers really wanted, like deep-dive analytics.
Istio has separated its data and control planes by using a sidecar loaded proxy which caches information so that it does not need to go back to the control plane for every call. The control planes are pods that also run in the Kubernetes cluster, allowing for better resilience in the event that there is a failure of a single pod in any part of the service mesh.
Architecture diagrams and more product information is available at Istio.io.
Linkerd is arguably the second most popular service mesh on Kubernetes and, due to its rewrite in v2, its architecture mirrors Istio’s closely, with an initial focus on simplicity instead of flexibility. This fact, along with it being a Kubernetes-only solution, results in fewer moving pieces, which means that Linkerd has less complexity overall. While Linkerd v1.x is still supported, and it supports more container platforms than Kubernetes; new features (like blue/green deployments) are focused on v2. primarily.
Linkerd is unique in that it is part of the Cloud Native Foundation (CNCF), which is the organization responsible for Kubernetes. No other service mesh is backed by an independent foundation.
Architecture diagrams and additional product information is available at Linkerd.io.
Comparison of Istion, Linkerd and Console Connect for Kubernetes Service Mesh
|Supported Workloads||Does it support both VMs-based applications and Kubernetes?|
|Workloads||Kubernetes + VMs||Kubernetes only||Kubernetes + VMs|
|Architecture||The solution’s architecture has implications on operation overhead.|
|Single point of failure||No – uses sidecar per pod||No||No. But added complexity managing HA due to having to install the Consul server and its quorum operations, etc., vs. using the native K8s master primitives.|
|Sidecar Proxy||Yes (Envoy)||Yes||Yes (Envoy)|
|Secure Communication||All services support mutual TLS encryption (mTLS), and native certificate management so that you can rotate certificates or revoke them if they are compromised.|
|Authentication and Authorization||Yes||Yes||Yes|
|Chaos Monkey-style Testing||Traffic management features allow you to introduce delays or failures to some of the requests in order to improve the resiliency of your system and harden your operations|
|Testing||Yes- you can configure services to delay or outright fail a certain percentage of requests||Limited||No|
|Observability||In order to identify and troubleshoot incidents, you need distributed monitoring and tracing.|
|Monitoring||Yes, with Prometheus||Yes, with Prometheus||Yes, with Prometheus|
|Deployment||Install via Helm and Operator||Helm||Helm|
|Operations Complexity||How difficult is it to install, configure and operate|
Any of these service meshes will solve your basic needs. The choice comes down to whether you want more than the basics.
Istio has the most features and flexibility of any of these three service meshes by far, but remember that flexibility means complexity, so your team needs to be ready for that.
For a minimalistic approach supporting just Kubernetes, Linkerd may be the best choice. If you want to support a heterogeneous environment that includes both Kubernetes and VMs and do not need the complexity of Istio, then Consul would probably be your best bet.
Migrating between service mesh solutions
Note that service mesh is not as an intrusive transformation as the one from monolithic applications to microservices, or from VMs to Kubernetes-based applications. Since most meshes use the sidecar model, most services don’t know that they run as a mesh. However, replacing one service mesh with another is complex, particularly when you want to standardize on the service mesh as a solution to scale across all your services.
So it’s important to choose wisely! Start with a sample project(s) and see which solution you prefer.
Istio is quickly becoming the standard for service mesh on Kubernetes. It is the most mature, but also the most complex to deploy. For a managed experience of consuming Istio at scale, stay tuned for when we announce our Managed Istio solution, as part of our Kubernetes managed apps!
Common use cases to take advantage of Service Mesh today
From an Operations point of view, Service Mesh is useful for any type of microservices architecture since it helps you control traffic, security, permissions, and observability.
Once you have a Kubernetes infrastructure + Microservices architecture consider the below use cases in order to take advantage of Service Mesh in your organization today, regardless of the scale of your applications.
By getting your feet wet with these, you can start standardizing on Service Mesh in your system design to lay the building blocks and the critical components for large-scale operations in the future.
- Improving observability into distributed services: with service-level visibility, tracing, and monitoring abilities. Some of the key capabilities of service mesh dramatically improve visibility as well as your ability to troubleshoot and mitigate incidents. For example, If one service in the architecture becomes a bottleneck, the common way to handle it is through re-tries, but that can worsen the bottleneck due to timeouts. With service mesh, you can easily break the circuit to failed services to disable non-functioning replicas and keep the API responsive.
- Blue/green deployments: with the ability to control traffic. Service mesh allows you to implement Blue/Green deployments to safely rollout new upgrades of the applications without risking service interruption. First, you expose only a small subset of users to the new version, validate it, then proceed to release it to all instances in Production.
- Chaos monkey/ testing in production scenarios: with the ability to inject delays, faults to improve the robustness of deployments
- ‘Bridge’ / enabler for modernizing legacy applications: If you’re in the throes of modernizing your existing applications to Kubernetes-based microservices, you can use service mesh as a ‘bridge’ while you’re de-composing your apps. You can register your existing applications as ‘services’ in the Istio service catalog and then start migrating them gradually to Kubernetes without changing the mode of communication between services – like a DNS router. This use case is similar to using Service Directory.
- API Gateway: If you’re bought into the vision of service mesh and want to start the rollout, but don’t yet have Kubernetes applications up and running, you can already have your Operations team start learning the ropes of using service mesh by deploying it simply to measure your API usage.
In its most mature implementation, Service mesh becomes the dashboard for microservices architecture. It’s the place for troubleshooting issues, enforcing traffic policies, rate limits, and testing new code. It’s your hub for monitoring, tracing and controlling the interactions between all services – how they are connected, perform and secured.