How-To Restore From ETCD Backup on a Multi-Master Cluster

Problem

How-To Restore From ETCD Backup on a Multi-Master Cluster?

Environment

  • Platform9 Managed Kubernetes - All Versions

  • ETCD

  • Docker or Containerd

Procedure

circle-exclamation
  1. Access the Master node where you want to perform the restore operation.

docker cp etcd:/usr/local/bin/etcdctl /opt/pf9/pf9-kube/bin
export PATH=$PATH:/opt/pf9/pf9-kube/bin
  1. Copy the Snapshot file to the Master node where you will run the restore operation.

    • The ETCD Backup Storage Path and ETCD Backup Interval parameters can be configured at the time of cluster creation and can also be changed later from the Platform9 UI by editing the cluster details. The default path for the backup storage path is /etc/pf9/etcd-backup.

circle-exclamation
  1. Scale down the total number of Master nodes to 1 by detaching the Master nodes from the cluster using the Platform9 UI. You can perform this action by selecting the cluster in "Infrastructure" tab as shown in below image.

  • Verify the etcd member count once the cluster has been scaled to 1 master node. For pf9-kube v1.19 and below, having etcd v3.3.22 run the command on the available master node to get etcd cluster-health.

  • For pf9-kube v1.20 and above, having etcd v3.4.14 run the command on the available master node to get etcd member and endpoint information.

  1. Clear any of the stale Master nodes from kubectl perspective using below command if required.

  1. Stop the PMK stack on the Master node where we will run the restore operation. Below command will stop all the K8s related services on the attached Master node.

  1. Move the etcd directory to some other path on the Master node.

  1. Restore the etcd snapshot by using below-mentioned command.

  • NODE_IP - IP of the Attached Node. Same can be found out from the "kubectl get nodes" output.

  • NODE UUID- Node_UUID corresponds to the value of host_id found in file /etc/pf9/host_id.conf on the Node.

  • </path/to/backupfilename> - File present at defined ETCD Backup Storage Path. Refer to Point 2 for more information.

  1. Start the PMK stack on the Master node.

  1. Verify the etcd member count once the stack is up and running. Note: There should just be one member in the etcd cluster.

  • For pf9-kube v1.19 and below, having etcd v3.3.22 run the command on the available master node to get etcd cluster-health.

  • For pf9-kube v1.20 and above, run the command on the available master node to get etcd member and endpoint information.

  1. From kubectl perspective, it will show the previously detached master nodes in a NotReady state. They may initially show as "Ready", but eventually they should reflect as "NotReady". To delete the stale master nodes, run the below-mentioned command.

  1. Once the cluster status is in a healthy state you can start the pf9-hostagent service on the master node. This will eventually start the pf9-nodeletd service.

  1. Make sure to there is no etcd data directory on nodes that will be used to scale up the master nodes in the cluster.

Warning

This is a required if you are planning to use the same Master nodes which we detached from the cluster in step number 3.

  1. Scale up the Master nodes from the Platform9 UI. You can perform this action by selecting the cluster in "Infrastructure" tab as shown in below image.

Last updated