Stateful applications with Kubernetes
When containers became mainstream, they were designed to support ephemeral – stateless – workloads. Since then, a lot of effort has been made to support stateful applications in the container ecosystem, with a lot of that focus targeted towards better support from core Kubernetes. Stateful applications – and the data they contain – are extremely common in most organizations and are vital to the business. Being able to support data-driven applications with Kubernetes enables more organizations to take advantage of containers for modernizing their legacy apps as well as for supporting additional mission-critical use cases – which are often stateful.
This post is intended as a crash course on the basics required to get started running any stateful application in Kubernetes.
Kubernetes Storage Constructs:
Stateful applications require, at minimum, persistent storage. Let’s first examine the Kubernetes storage constructs to understand how you would persist data in Kubernetes. The most basic distinction to start with is between local storage vs. Persistent Volumes.
Kubernetes Volumes
Volumes are the basic unit of storage in Kubernetes. A Volume is storage that’s attached – and dependent – to the pod and its lifecycle. A volume has no persistence at all and is mostly used for storing temporary, local data that doesn’t need to exist outside the pod’s lifecycle. Once the pod is destroyed, its local volume is also released.
Volumes can mount nfs, ceph, gluster, aws block storage, azure or google disk, git repos, secrets, ConfigMaps, hostpath, and more. In these cases the pod will not create or destroy the storage, it will simply attach the volume to whatever mount points are identified in the pod specification.
An exception to that is a type of volume called emptyDir. emptyDir is a special case where the pod will create its own temporary storage and mount it to the containers in the pod so they can all share files back and forth. The shared storage is deleted forever when the pod is removed from the node.
As an example, below is a very simple pod specification with a container using emptyDir on different mount points so the containers can all share files:
apiVersion: v1
kind: Pod
metadata:
name: emptyDir-pod
spec:
containers:
- image: nginx
name: ed-nginx
volumeMounts:
- mountPath: /a
name: ed-volume
containers:
- image: redis
name: ed-redis
volumeMounts:
- mountPath: /b
name: ed-volume
volumes:
- name: ed-volume
emptyDir: {}
Kubernetes Persistent Volumes
Now that we’ve identified what a ‘regular’ volume is in Kubernetes it is easy to see some of its limitations around portability, persistence, and scalability. Stateful applications require that data that is used or generated by the app is persisted, retained, backed up and accessible outside of the particular hosts that run the application. This is where Persistent Volumes (PV) come into play.
Where basic volumes are essentially unmanaged, a Persistent Volume is managed by the cluster. Persistent volumes remain available outside of the pod lifecycle and can be claimed by other pods. Their data can be retained and backed up.
When creating a PV, the administrator specifies for the Kubernetes cluster which storage filesystem to provision, and with which configuration – including size, volume IDs, names, access modes, and other specification. The configuration is specified in a StorageClass.
PVs are resources in a cluster. Persistent Storage Claim (PVC) are requests for these resources, made with a specific StorageClass for the desired configuration.
The Kubernetes master continuously listens for new pods being created with PVC requests. When a new PVC is identified, the Master will find the matching PV and bind it to the PVC. The bound volume would then be mounted to a pod.
With that, each pod is created with the required storage (and its config and environment variables), and each replica would have the same storage type attached and mounted. These pods can then scale with StategulSet (more on that later) so that new pods that join the distributed application have the same storage attached.
Creating a Persistent Volume in Kubernetes:
The steps involved in creating a persistent volume and attaching it to a container in a pod are:
- Create a StorageClass which defines the type of storage that will be used. This could be AWS, EBS, Portworx, etc.If a provisioner is defined (for example when using a Kubernetes cloud service, or if your distribution has a default provisioner) then Kubernetes will connect to the storage provider and allocate a new persistent volume whenever a claim is made. Otherwise, the cluster administrator needs to manually add new persistent volumes before each claim can be made.Sample StorageClass:
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: px-high-io provisioner: kubernetes.io/portworx-volume parameters: repl: "1" snap_interval: "120" io_priority: "high"
Sample PersistentVolume (PV) – for manual creation:
kind: PersistentVolume apiVersion: v1 metadata: name: app-fourty-two-pv labels: type: local spec: storageClassName: manual capacity: storage: 1Gi accessModes: - ReadWriteOnce hostPath: path: "/data"
PVs can also be created dynamically. To learn more about dynamic volumes, CSI and how to hack on your storage configuration in Kubernetes, see this deep-dive Kubernetes Storage how-to article.
- Create a PersistentVolumeClaim (PVC) which will have the cluster set aside storage to be used by your application in its pod specifications.Sample PersistentVolumeClaim:
kind: PersistentVolumeClaim apiVersion: v1 metadata: name: app-fourty-two-pv-claim spec: storageClassName: px-high-io accessModes: - ReadWriteOnce resources: requests: storage: 1Gi
- Define the volume you want to use in the deployment (pod, statefulset, etc.)Sample deployment using a PersistentVolumeClaim:
apiVersion: v1 kind: Pod metadata: name: mysql-app spec: containers: - name: mysql image: mysql env: - name: MYSQL_ROOT_PASSWORD value: "rootpasswd" volumeMounts: - mountPath: /var/lib/mysql name: data subPath: mysql volumes: - name: data persistentVolumeClaim: claimName: app-fourty-two-pv-claim
Deployments with StatefulSet
Deploying a stateful application into Kubernetes can now leverage a specific model called StatefulSet. A StatefulSet is essentially a Kubernetes deployment object with unique characteristics specifically for stateful applications. Like ‘regular’ deployments or ReplicaSet, StatefulSet manages deploying of Pods that are based on a certain container spec. But unlike a regular deployment, it allows you to specify the order and dependencies of the deployment to
StatefulSet Deployments provide:
- Stable, unique network identifiers: Each pod in a StatefulSet is given a hostname that is based on the application name and increment. For example, web1, web2, web3 and web4, for a StatefulSet named “web” that has 4 instances running.
- Stable, persistent storage: Each and every pod in the cluster is given its own persistent volume based on the storage class defined, or the default, if none are defined. Deleting or scaling down pods will not automatically delete the volumes associated with them- so that the data persists. To purge unneeded resources, you could scale the StatefulSet down to 0 first, prior to deletion of the unused pods.
- Ordered, graceful deployment and scaling: Pods for the StatefulSet are created and brought online in order, from 1 to n, and they are shut down in reverse order to ensure a reliable and repeatable deployment and runtime. The StatefulSet will not even scale until all the required pods are running, so if one dies, it recreates the pod before attempting to add additional instances to meet the scaling criteria.
- Ordered, automated rolling updates: StatefulSets have the ability to handle upgrades in a rolling manner where it shuts down and rebuilds each node in the order it was created originally, continuing this until all the old versions have been shut down and cleaned up. Persistent volumes are reused, and data is automatically migrated to the upgraded version.
When deploying a Kubernetes application using the regular deployment and a ReplicaSet or a StatefulSet, you define the application as a Kubernetes Service, so other applications can interact with it. Session affinity is achieved by enabling “sticky sessions,” allowing clients to go back to the same instance as often as possible, which helps with performance – especially for stateful applications with caching.
Sample StatefulSet for Cassandra database with multiple instances each with their own persistent volume. (This contains the storage class but would need to be exposed by a service.)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: cassandra
labels:
app: cassandra
spec:
serviceName: cassandra
replicas: 3
selector:
matchLabels:
app: cassandra
template:
metadata:
labels:
app: cassandra
spec:
terminationGracePeriodSeconds: 1800
containers:
- name: cassandra
image: gcr.io/google-samples/cassandra:v13
imagePullPolicy: Always
ports:
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
resources:
limits:
cpu: "500m"
memory: 1Gi
requests:
cpu: "500m"
memory: 1Gi
securityContext:
capabilities:
add:
- IPC_LOCK
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- nodetool drain
env:
- name: MAX_HEAP_SIZE
value: 512M
- name: HEAP_NEWSIZE
value: 100M
- name: CASSANDRA_SEEDS
value: "cassandra-0.cassandra.default.svc.cluster.local"
- name: CASSANDRA_CLUSTER_NAME
value: "K8Demo"
- name: CASSANDRA_DC
value: "DC1-K8Demo"
- name: CASSANDRA_RACK
value: "Rack1-K8Demo"
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
readinessProbe:
exec:
command:
- /bin/bash
- -c
- /ready-probe.sh
initialDelaySeconds: 15
timeoutSeconds: 5
# These volume mounts are persistent. They are like inline claims,
# but not exactly because the names need to match exactly one of
# the stateful pod volumes.
volumeMounts:
- name: cassandra-data
mountPath: /cassandra_data
# These are converted to volume claims by the controller
# and mounted at the paths mentioned above.
# do not use these in production until ssd GCEPersistentDisk or other ssd pd
volumeClaimTemplates:
- metadata:
name: cassandra-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast
resources:
requests:
storage: 1Gi
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fast
provisioner: k8s.io/minikube-hostpath
parameters:
type: pd-ssd
Operators can help
While operators are not necessary, they are more robust than a deployment or StatefulSet, and can help run stateful apps on Kubernetes with features like application-level HA management, backups and restore.
You can use existing Operators or develop your own. The operator package includes all the configuration needed to deploy and manage the application from a Kubernetes point of view – from a StatefulSet to be used to any required storage, rollout strategies, persistence and affinity configuration, and more. Kubernetes will then rely on the operator to validate instances of the application against the specification to ensure it runs in the same way across instances in all clusters it is deployed in.
Stateful App Scenarios Examples:
Let’s look at two common scenarios for Kubernetes stateful application: apps powered by a NoSQL/sharded database, and apps using a relational database for their backend.
In both these cases, we’d use PV and PVCs to have Kubernetes provision and manage the persistent storage.
Cassandra or other NoSQL/sharded databases
In these cases, the database is designed to be fault-tolerant and easier scaling. For example, in the case of Cassandra you already have 3 copies of the data typically, and all the nodes are equal (no master/slave designation). If one node fails the other nodes are still accepting data and the application doesn’t need to be aware of any DB availability issues. In the case of NoSQL databases, a best practice is to not create too many replicas ((keep it at 3) to accelerate start-up time if a node fails and a new replica is automatically created.
A PostgreSQL database that is backing a business application
PostgreSQL, like most relational databases, typically runs as a single instance, so there is no cluster to maintain data. When running a relational database in Kubernetes, try to keep it small as much as possible so that the in-flight surface is smaller. That way, if a pod dies and becomes available on a different node your start-up time will be faster to restore in-flight transactions from the binary logs.
Container-based storage solutions that work natively with Kubernetes and offer built-in replication and abstraction across environments are also helpful. The storage class in Kubernetes could point to anything from an EBS block storage to NFS share for this usage; or, when performance matters, an enterprise-class storage solution like Ceph, or a physical SAN over Fibre Channel. Container-friendly software-defined storage like Ceph, GlusterFS, or Portworx can co-exist in the same Kubernetes cluster but would be hosted on nodes with extra storage capacity in the form of dedicated solid-state drives.
Conclusion
Stateful applications are one of the most common types of applications being containerized and moved to Kubernetes-managed environments. With advancements in Kubernetes storage constructs and operations, you can no support data-driven application on Kubernetes as well.
For more information on the Kubernetes components mentioned check out the latest documentation on kubernetes.io.
- Beyond Kubernetes Operations: Discover Platform9’s Always-On Assurance™ - November 29, 2023
- KubeCon 2023 Through Platform9’s Lens: Key Takeaways and Innovative Demos - November 14, 2023
- Getting to know Nate Conger: A candid conversation - June 12, 2023