Production considerations for Multi-Master Kubernetes

Kubernetes has been around for five years and, at this point, has become a stable platform that is commonly used throughout development and production environments to run applications. In the time it has been a project, the base project has become more capable. It makes it easier to build and deploy a reliable cluster on your own, and then add in the components you want, so it meets your individual requirements.

In this blog post you’ll learn about production considerations including; clustering High-Availability etcd, creating a new etcd database cluster, and more in your enterprise Kubernetes environment.

Production Considerations

In development, things like having a single etcd node and single master control plane aren’t a problem, since when they are down, it doesn’t actually stop anything from running; you just can’t manage the cluster while they are down. Worst case is that a critical bug is found in an application and it crashes, which means it won’t be brought fully back online until the master server or etcd host are back online.

Beyond the high availability consideration for the masters, when looking at deploying a home-built Kubernetes cluster into production, there are numerous other items to consider.

Monitoring all the metrics available within a Kubernetes cluster for health and usage patterns, using a proven tool like Prometheus
Graphical console (like Dashboard) with observability of monitor metrics, with a tool like Grafana
Watching and analysing log files by centralizing them in a system like elastic search.
Security patching from the operating system layer, all through the Kubernetes cluster
Application binary scanning with an open source tool like Clair, or from one of many commercial vendors (Aqua Security, Palo Alto Networks, Trend Micro, Synopsys, etc).
Security integration between RBAC and external authentication and authorization services.
Load balancers and their publicly-facing SSL certificates. Let’s Encrypt is the low cost entry point for independent SSL certificates.
Networking layer – Is the default wide-open policy enough, or do you need multi-tenancy or even network policy support? There are many vendors that can provide tools from the purely open source Open vSwitch, to the open source based Tigera Calico, to purely commercial VMware NSX-T.

Clustering High-Availability etcd

1. New etcd Database Cluster

When creating a new etcd database cluster, it is as easy as specifying all the instances on the command line on each node as you start the cluster, and letting the instances figure it out. If you can, this is always the best way to start. Three nodes are the recommended minimum; five nodes are recommended for most reasonably-sized production clusters (1000+ pods).

For clusters using static discovery, each node needs to know the IP of the other nodes. Following, are the exact commands to run on the three nodes in a new cluster. These commands also enable in-transit encryption of all communication channels.

First up is creating the certificates that will be used across the nodes. This can be done on any Mac or Linux host with openssl installed. In this case the hostnames are infra0, infra1, and infra2.

Create the Certificate Authority (and private key):



openssl genrsa -out ca-key.pem 2048
openssl req -x509 -new -nodes -key ca-key.pem -days 10000 -out ca.pem -subj "/CN=etcd-ca"

Create the configuration used to create the client certificate:


# vi openssl.conf
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name

[req_distinguished_name]

[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names

[ ssl_client ]
extendedKeyUsage = clientAuth, serverAuth
basicConstraints = CA:FALSE
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid,issuer
subjectAltName = @alt_names

[ v3_ca ]
basicConstraints = CA:TRUE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names
authorityKeyIdentifier=keyid:always,issuer

[alt_names]
DNS.1 = localhost
DNS.2 = infra0
DNS.3 = infra1
DNS.4 = infra2
IP.1 = 127.0.0.1
IP.2 = 10.1.2.11
IP.3 = 10.1.2.12
IP.4 = 10.1.2.13

Load the configuration into the environment and then create the client key, client certificate, and sign it.



CONFIG=openssl.conf
openssl genrsa -out infra-key.pem 2048
openssl req -new -key infra-key.pem -out infra.csr \
  -subj "/CN=etcd-01" -config ${CONFIG}
openssl x509 -req -in infra.csr -CA ca.pem -CAkey ca-key.pem \
  -CAcreateserial -out infra.pem -days 3650 \
  -extensions ssl_client -extfile ${CONFIG}

And finally, copy the certificates to the nodes that will host etcd in a new directory named /etc/ssl/etcd/ssl/. These commands will copy the files and preserve permissions.


ssh root@10.1.2.11 mkdir -p /etc/ssl/etcd/ssl/
scp -p ca.pem ca-key.pem infra-key.pem infra.pem infra.csr root@10.1.2.11:/etc/ssl/etcd/ssl/

On each node you install etcd: a couple environment configurations, a setup command, then the same four commands will create the startup file and start the service. This is based on CentOS host, but other platforms have a very similar configuration setup:

Node 1:


sudo yum -y install etcd

export NODE1_IP=10.1.2.11
export NODE2_IP=10.1.2.12
export NODE3_IP=10.1.2.13
export INTERNAL_IP=${NODE1_IP}
export ETCD_NAME=infra0
cat <

Node 2:



sudo yum -y install etcd

export NODE1_IP=10.1.2.11
export NODE2_IP=10.1.2.12
export NODE3_IP=10.1.2.13
export INTERNAL_IP=${NODE2_IP}
export ETCD_NAME=infra1
cat <

Node 3:



sudo yum -y install etcd

export NODE1_IP=10.1.2.11
export NODE2_IP=10.1.2.12
export NODE3_IP=10.1.2.13
export INTERNAL_IP=${NODE3_IP}
export ETCD_NAME=infra2
cat <

On all the nodes, reload the daemon-config; enable, and start etcd on all nodes:

sudo systemctl enable --now etcd
# This command sets the cluster to existing for the next start
sed -i s'/ETCD_INITIAL_CLUSTER_STATE="new"/ETCD_INITIAL_CLUSTER_STATE="existing"/'g \
  /etc/etcd/etcd.conf

And now, verify its running

# etcdctl -C https://10.1.2.11:2379 --ca-file /etc/ssl/etcd/ssl/ca.pem cluster-health
member 21ce7d4f60f4ade0 is healthy: got healthy result from https://10.1.2.11:2379
member 7e5b4986949de455 is healthy: got healthy result from https://10.1.2.12:2379
member 85313efd8bbb270f is healthy: got healthy result from https://10.1.2.13:2379
cluster is healthy

Adding HA to Existing etcd Database

If you have an existing etcd database running, before you can add or remove nodes, a quorum – the majority of nodes – needs to be active, or it will not allow write operations. So in a database cluster with three instances, two need to be alive for you to make any changes to the database or its runtime configuration. If the cluster has less than three instances, then all instances need to be alive to make changes. This is because of the way the formula – which is basically (n/2)+1 – works.
So, if you have a single instance or two instances, in the event of a failure, no changes can be made until all are back online.

With a quorum in place, adding new nodes involves configuring additional nodes as per the setup instructions listed above, including updating and distributing any certificates.

Once the new nodes are configured with their type set to existing not new, simply update the existing cluster’s runtime configuration to have the new peer addresses before starting the etcd service on those new nodes. The runtime update is fairly straightforward.

$ etcdctl member add infra3 --peer-urls=http://10.1.2.14:2380
added member 9bf1b35fc7761a23 to cluster

After the address is successfully added, start the services on the new nodes. If you need to add a new node to replace an existing failed node, then always perform the remove operation first, as that makes it easy for the cluster to reach the quorum.

Note: Don’t forget to update the configuration files like /etc/etcd/etcd.conf on any existing nodes to reflect the new host, if static discovery is being used as in the example above.

Going to a Multi-Master Configuration

The beautiful thing about the actual Kubernetes deployment of its control plane is, it is identical on every single node. Regardless if you have one node or 100 control plane nodes, all the configurations are stored in the etcd database. So, if you need to add or remove control plane nodes, it is as simple as duplicating any TLS certificates, configuration, and binaries from an existing host to all new hosts. As long as they can connect to the etcd datastore over the network (TLS certificates are a big part of this), then all is well.

The basic steps are to get the binaries

wget -q --show-progress --https-only --timestamping \
  "https://dl.k8s.io/v1.17.0/kubernetes-server-linux-amd64.tar.gz”
tar xzf kubernetes-server-linux-amd64.tar.gz

Move any certificates, private keys, and secrets configuration to the runtime folder

sudo mv ca.pem ca-key.pem kubernetes-key.pem kubernetes.pem \
    service-account-key.pem service-account.pem \
    encryption-config.yaml /etc/kubernetes/

Then enable the services using systemd, initV, or kubelet initialization parameters. Finally, add any new nodes to the load balancer that is required to balance traffic across all the API Server instances.

Next Steps

Now that your multi-master cluster is in place, it is time to pick all the individual components that are required to satisfy items like your monitoring and security controls. A great place to find all these solutions – including centralized management – is with SaaS providers like Platform9, which can simply and easily provide a very solid base to bring any Kubernetes cluster up to a manageable and maintainable state. At that point it is more about filling in the few missing gaps unique to your environment, instead of building out proven core management capabilities.

Author

Platform9

Platform9 is a leader in simplifying enterprise private clouds. Our flagship product, Private Cloud Director, turns existing infrastructure into a full-featured private cloud. Enterprise IT teams can manage VMs and containers with familiar GUI tools and automated APIs in a private, secure environment.

View all posts