How Are Rolling Cluster Upgrades Performed With Managed Kubernetes

Automated Rolling Upgrades of Kubernetes Clusters

Managed Kubernetes supports fully automated rolling upgrades to Kubernetes clusters. Upgrades are performed across major versions of Kubernetes, so you can continue leveraging latest Kubernetes features without the hassle of having to upgrade the cluster yourself.

We upgrade nodes in a cluster one at a time, ensuring that the last upgraded node is healthy before upgrading the next. This is called a rolling upgrade, and has the following benefits:

Your applications will not experience downtime during the cluster upgrade, as long as they tolerate the failure of a single node.
Your cluster users and your “Kubernetes-native” applications (i.e., ones that talk to the Kubernetes API server) will be able use the API while worker nodes are being upgraded. In addition, if you have created a multi-master highly available Kubernetes cluster, then your API server will also remain available across upgrades to master nodes. For a single master node cluster, the API server will experience a momentary downtime while the master node is being upgraded.
All nodes in your cluster will remain compatible during the cluster upgrade, despite running different versions of Kubernetes as the upgrade proceeds.

Important Notes & Warnings

To upgrade a node, we first tell Kubernetes not to schedule any further Pods on the node. We then evacuate (drain, in the Kubernetes parlance) existing Pods from the node. Evacuating a node will remove unmanaged Pods (see explanation below) and permanently erase data in emptyDir Volumes (see explanation below).

Managed vs Unmanaged Pods

Pods are the atomic unit of work in Kubernetes. Cluster users deploy Pods that fall into two categories:

Managed Pods

These are Pods that are managed by a ReplicationController,ReplicaSet, Job, or DaemonSet. Containers in Pods managed by a DaemonSet are stopped during the upgrade, but the Pods remain on the node. Pods managed by other controllers are rescheduled by Kubernetes to other nodes as long as resources are available.

Unmanaged Pods

These are Pods that are not managed by any Kubernetes controller. These Pods will be removed from the node during the upgrade and will not be rescheduled by Kubernetes. Note that, if the node fails, these Pods will not be rescheduled by Kubernetes . For that reason, unmanaged Pods should not be used in production, though they are useful for experimenting and debugging.

emptyDir Volumes

Pods have access to persistent storage through Kubernetes Volumes. If your Pod uses an emptyDir Volume, be warned that all data stored in this Volume will be erased when the Pod is removed from the node. This warning applies to any unmanaged Pod as well as all Pods managed by a ReplicationController, ReplicaSet, or Job. Note that, if the node fails, the data on this Volume may be unrecoverable. For the reason, emptyDir Volumes should not be used in production.

Conclusion

Our cluster upgrades enable you to take advantage of the latest Kubernetes improvements and new features while ensuring that many applications experience no downtime, and all cluster users experience minimal downtime of the Kubernetes API. Upgrading a Kubernetes node means evacuating Pods from the node. As a consequence, a limited set of Pods and Volumes are erased.

Following the recommendations above to avoid unmanaged Pods and emptyDir Volumes will make your applications more robust to node failures and also avoid downtime and data loss during cluster upgrades.