While Restoring LTS2-Patch2 On SMCP, Management Plane Cluster Backup-Restore Process Fails.

Problem

During the restoration process of LTS2-patch2 [v-5.6.7-2624593] to SMCP, the restore step is failing with below error:

# airctl restore --backupdir /root/ --config /opt/pf9/airctl/conf/airctl-config.yaml --verbose
...
2023-09-22T14:01:25.353Z        info    restoring mysql
2023-09-22T14:01:25.353Z        info    state file does not contain SSH user
Starting vault... (6m28s)2023-09-22T14:01:25.456Z    debug   found pod percona-db-pxc-db-pxc-0
  ERROR   setting up kplane...
2023-09-22T14:01:34.475Z        error   failed to install kplane components: failed to install helm chart /sbin/helm install kplane-usermgr /opt/pf9/airctl/conf/helm_charts/kplane-components-0.3.4.tgz -f /opt/pf9/airctl/conf/kplane_values.yaml -f /opt/pf9/airctl/conf/secrets.yaml: exit status 1 - Error: INSTALLATION FAILED: execution error at (kplane-components/templates/required.yaml:17:5): consul_fallback_token is required from values.yaml
2023-09-22T14:01:34.475Z        fatal   error: failed to install helm chart /sbin/helm install kplane-usermgr /opt/pf9/airctl/conf/helm_charts/kplane-components-0.3.4.tgz -f /opt/pf9/airctl/conf/kplane_values.yaml -f /opt/pf9/airctl/conf/secrets.yaml: exit status 1 - Error: INSTALLATION FAILED: execution error at (kplane-components/templates/required.yaml:17:5): consul_fallback_token is required from values.yaml
...

Environment

  • Platform9 Edge Cloud- LTS2-Patch2 [v-5.6.7-2624593].

Cause

This is a known issue. Jira AIR-1218 has been filed to track and resolve it.

Platform9 Engineering team is actively working to fix this issue.

Workaround

As a workaround, please follow the steps mentioned below:

  1. Ensure your existing DU has no issues by running the following command and verifying that task state is ready

  1. Run the upgrade operation following the upgrade guide. (Upgrade from LTS2-patch#2 to LTS2-patch#4)

circle-info

Info

The upgrade operation is expected to fail due a known issue which can be ignored. The upgrade, however it fails, fixes the state files which are essential for the restoration of LTS2 on SMCP. But the upgrade from LTS2-patch#2 to LTS2-patch#4 is affected due to removal of internal component known as decco and some related codebase changes.

The expected error message is shown below:

  1. After this, please follow restore process of smcparrow-up-right with following change:

In step#7, while updating the nodelet-bootstrap.yaml file add the kubedu-imgs tar file from LTS2-Patch#2 to the userImages section as well. A snippet of the yaml file shown below for reference:

Additional Information

In some cases, especially on systems with limited resources, the container runtime can perform a garbage collection of some of the kubedu images which have not been used yet. This can cause some of the operations like airctl upgrade/upgrade-hosts to fail due to ImagePullBackOff errors.

We can determine whether the images need to be reloaded by running and making sure the images that we need for du-upgrade or host-upgrade have not been cleaned up.

For reference, some of the images we should look for are quay.io/platform9/k8s-helm-runner and quay.io/platform9/kplane-host-upg.

If we find that images are missing we can run the following command before the upgrade/upgrade-hosts operations

Last updated