Kubernetes Upgrade: The Definitive Guide to Do-It-Yourself

Overview

This article will cover:

Kubernetes Upgrade Paths
Upgrading Kubernetes: A Step-by-Step Guide
Etcd Upgrade Paths
Upgrading etcd

Operating enterprise Kubernetes deployment is difficult. That is due in no small part to the fact that Kubernetes is not just one tool, but a collection of a dozen-odd components that provide functionality ranging from application deployments and upgrades, to logging and monitoring, to persistent data storage.

Kubernetes is one of the most active projects on Github to date, having amassed more than 80k commits and 550 releases. The process of installing an HA Kubernetes cluster on-premises or in the Cloud is well documented and, in most cases, we don’t have to perform many steps. There are additional tools like Kops or Kubespray that help to automate some of this process.

Every so often, though, we are required to upgrade the cluster to keep up with the latest security features and bug fixes, as well as benefit from new features being released on an on-going basis. This is especially important when we have installed a really outdated version (for example v1.9) or if we want to automate the process and always be on top of the latest supported version.

In general, when operating an HA Kubernetes Cluster, the upgrade process involves two separate tasks which may not overlap or be performed simultaneously: upgrading the Kubernetes Cluster; and, if needed, upgrading the etcd cluster which is the distributed key-value backing store of Kubernetes. Let’s see how we can perform those tasks with minimal disruptions.

Kubernetes Upgrade Paths

Note that this upgrade process is specifically for manually installing Kubernetes in the Cloud or on-premises. It does not cover Managed Kubernetes Environments (like our own, where Upgrades are automatically handled by the platform), or Kubernetes services on public clouds (such as AWS’ EKS or Azure Kubernetes Service), which have their own upgrade process.

For the purposes of this tutorial, we assume that a healthy 3-node Kubernetes and Etcd Clusters have been provisioned. I’ve setup mine using six DigitalOcean Droplets plus one for the worker node.

Let’s say that we have the following Kubernetes master nodes all running v1.13:

Name	Address	Hostname
kube-1	10.0.11.1	kube-1.example.com
kube-2	10.0.11.2	kube-2.example.com
kube-3	10.0.11.3	kube-3.example.com

Also, we have one worker node running v1.13:

Name	Address	Hostname
worker	10.0.12.1	worker..example.com

The process of upgrading the Kubernetes master nodes is documented on the Kubernetes documentation site. The following are the current paths:

There is only one documented version for HA Clusters here, but we can reuse the steps for the other upgrade paths. In this example, we are going to see an upgrade path from v1.13 to v.1.14 HA. Skipping a version – for example, upgrading from v1.13 to v.1.15 – is not recommended.

Before we start, we should always check the release notes of the version that we intend to upgrade, just in case they mention breaking changes.

Upgrading Kubernetes: A Step-by-Step Guide

Let’s follow the upgrade steps now:

1. Login into the first node and upgrade the kubeadm tool only:


$ ssh admin@10.0.11.1

$ apt-mark unhold kubeadm && \

$ apt-get update && apt-get install -y kubeadm=1.13.0-00 && apt-mark hold kubeadm

The reason why we run apt-mark unhold and apt-mark hold is because if we upgrade kubeadm then the installation will automatically upgrade the other components like kubelet to the latest version (which is v1.15) by default, so we would have a problem. To fix that, we use hold to mark a package as held back, which will prevent the package from being automatically installed, upgraded, or removed.

2. Verify the upgrade plan:


$ kubeadm upgrade plan
...

COMPONENT            CURRENT AVAILABLE

API Server           v1.13.0 v1.14.0

Controller Manager   v1.13.0 v1.14.0

Scheduler            v1.13.0 v1.14.0

Kube Proxy           v1.13.0 v1.14.0

...

3. Apply the upgrade plan:



$ kubeadm upgrade plan apply v1.14.0

4. Update Kubelet and restart the service:


$ apt-mark unhold kubelet && apt-get update && apt-get install -y kubelet=1.14.0-00 && apt-mark hold kubelet
$ systemctl restart kubelet

5. Apply the upgrade plan to the other master nodes:


$ ssh admin@10.0.11.2
$ kubeadm upgrade node experimental-control-plane
$ ssh admin@10.0.11.3
$ kubeadm upgrade node experimental-control-plane

6. Upgrade kubectl on all master nodes:


$ apt-mark unhold kubectl && apt-get update && apt-get install -y kubectl=1.14.0-00 && apt-mark hold kubectl

7. Upgrade kubeadm on first worker node:


$ ssh worker@10.0.12.1
$ apt-mark unhold kubeadm && apt-get update && apt-get install -y kubeadm=1.14.0-00 && apt-mark hold kubeadm

8. Login to a master node and drain first worker node:


$ ssh admin@10.0.11.1
$ kubectl drain worker --ignore-daemonsets

9. Upgrade kubelet config on worker node:


$ ssh worker@10.0.12.1
$ kubeadm upgrade node config --kubelet-version v1.14.0

10. Upgrade kubelet on worker node and restart the service:


$ apt-mark unhold kubelet && apt-get update && apt-get install -y kubelet=1.14.0-00 && apt-mark hold kubelet
$ systemctl restart kubelet

11. Restore worker node:


$ ssh admin@10.0.11.1
$ kubectl uncordon worker
Step 12: Repeat steps 7-11 for the rest of the worker nodes.
Step 13: Verify the health of the cluster:
$ kubectl get nodes

Etcd Upgrade Paths

As you already know, etcd is the highly distributed key-value backing store for Kubernetes, and it’s essentially the point of truth. When we are running an HA Kubernetes cluster, we also want to run an HA etcd cluster because we want to have a fallback just in case some nodes fail.

Typically, we would have a minimum of 3 etcd nodes running with the latest supported version. The process of upgrading the etcd nodes is documented in the etcd repo. These are the current paths:

When planning for etcd upgrades, you should always follow this plan:

Check which version you are using. For example:
```
$ ./etcdctl endpoint status
```
Do not jump more than one minor version. For example, do not upgrade from 3.3 to 3.5. Instead, go from 3.3 to 3.4, and then from 3.4 to 3.5.
Use the bundled Kubernetes etcd image. The Kubernetes team bundles a custom etcd image located here which contains etcd and etcdctl binaries for multiple etcd versions as well as a migration operator utility for upgrading and downgrading etcd. This will help you automate the process of migrating and upgrading etcd instances.

Out of those paths, the most important change is the path from 2.3 to 3.0, as there is a major API change which is documented here. You should also take note that:

- Etcd v3 is able to handle requests for both the v2 and v3 data. For example, we can use the ETCDCTL_API env variable to specify the API version:
```
$ ETCDCTL_API=2 ./etcdctl endpoint status
```

Running etcd v3 against the v2 data dir doesn’t automatically upgrade the data dir to the v3 format.
Using v2 api against etcd v3 only updates the v2 data stored in etcd.

You may also wonder which versions of Kubernetes have support for each etcd version. There is a small section in the documentation which says:

Kubernetes v1.0: supports etcd2 only
Kubernetes v1.5.1: etcd3 support added, new clusters still default to etcd
Kubernetes v1.6.0: new clusters created with kube-up.sh default to etcd3, and kube-apiserver defaults to etcd3
Kubernetes v1.9.0: deprecation of etcd2 storage backend announced
Kubernetes v1.13.0: etcd2 storage backend removed, kube-apiserver will refuse to start with –storage-backend=etcd2, with the message etcd2 is no longer a supported storage backend

So, based on that information, if you are running Kubernetes v1.12.0 with etcd2, then you are required to upgrade etcd to v3 when you upgrade Kubernetes to v1.13.0 as –storage-backend=etcd3 is not supported. If you have Kubernetes v1.12.0 and below, you can have both etcd2 and etcd3 running.

Before every step, we should always perform basic maintenance procedures such as periodic snapshots and periodic smoke rollbacks. Make sure to check the health of the cluster:

Let’s say we have the following etcd cluster nodes:

Name	Address	Hostname
etcd-1	10.0.11.1	etcd-1.example.com
etcd-2	10.0.11.2	etcd-2.example.com
etcd-3	10.0.11.3	etcd-3.example.com


$ ./etcdctl cluster-health
member 6e3bd23ae5f1eae2 is healthy: got healthy result from http://10.0.1.1:22379
member 924e2e83f93f2565 is healthy: got healthy result from http://10.0.1.2:22379
member 8211f1d0a64f3269 is healthy: got healthy result from http://10.0.1.3:22379
cluster is healthy

Upgrading etcd

Based on the above considerations, a typical upgrade etcd procedure consists of the following steps:

1. Login to the first node and stop the existing etcd process:


$ ssh 10.0.1.1
$ kill `pgrep etcd`

2. Backup the etcd data directory to provide a downgrade path in case of errors:


$ ./etcdctl backup \
      --data-dir %data_dir% \
      [--wal-dir %wal_dir%] \
      --backup-dir %backup_data_dir%
      [--backup-wal-dir %backup_wal_dir%]

3. Download the new binary taken from etcd releases page and start the etcd server using the same configuration:


ETCD_VER=v3.3.15
# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}

rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /usr/local/etcd && mkdir -p /usr/local/etcd

curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /usr/local/etcd --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

/usr/local/etcd/etcd --version
ETCDCTL_API=3 /usr/local/etcd/etcdctl version
# start etcd server
/usr/local/etcd/etcd -name etcd-1 -listen-peer-urls http://10.0.1.1:2380 -listen-client-urls http://10.0.1.1:2379,http://127.0.0.1:2379 -advertise-client-urls http://10.0.1.1:2379,http://127.0.0.1:2379

4. Repeat step 1 to step 3 for all other members.

5. Verify that the cluster is healthy:


$ ./etcdctl endpoint health
10.0.1.1:12379 is healthy: successfully committed proposal: took =
10.0.1.2:12379 is healthy: successfully committed proposal: took =
10.0.1.3:12379 is healthy: successfully committed proposal: took =

Note: If you are having issues connecting to the cluster, you may need to provide HTTPS transport security certificates; for example:


$ ./etcdctl --ca-file=/etc/kubernetes/pki/etcd/ca.crt --cert-file=/etc/kubernetes/pki/etcd/server.crt --key-file=/etc/kubernetes/pki/etcd/server.key endpoint health

For convenience, you can use the following environmental variables:


ETCD_CA_FILE=/etc/kubernetes/pki/etcd/ca.crt
ETCD_CERT_FILE=/etc/kubernetes/pki/etcd/server.crt
ETCD_KEY_FILE=/etc/kubernetes/pki/etcd/server.key

Recommended Readings

Kubernetes Service Mesh: A Comparison of Istio, Linkerd and Consul

A Practical Guide to Kubernetes Service Discovery

Kubernetes Multi-Tenancy Best Practices

Kubernetes CI/CD Best Practices

Kubernetes Autoscaling

Kubernetes Stateful Applications

Final thoughts

In this article, we showed step-by-step instructions on how to upgrade both Kubernetes and Etcd clusters. These are important maintenance procedures and eventualities for the day-to-day operations in a typical business environment. All participants who work with HA Kubernetes deployments should become familiar with the previous steps.

However, if you favor operational velocity and fewer maintenance tasks, you can consider using to a fully managed Kubernetes solution that automates both deployments and Day2 operations – including zero-touch upgrades.

Learn more about Platform9 Managed Kubernetes.

Try our Sandbox to experience remote Kubernetes upgrades – with no operational overhead or service downtime.

Interested in More Content?

Author

Platform9

Platform9 is a leader in simplifying enterprise private clouds. Our flagship product, Private Cloud Director, turns existing infrastructure into a full-featured private cloud. Enterprise IT teams can manage VMs and containers with familiar GUI tools and automated APIs in a private, secure environment.

View all posts