While Restoring LTS2-Patch2 On SMCP, Management Plane Cluster Backup-Restore Process Fails.

Problem

During the restoration process of LTS2-patch2 [v-5.6.7-2624593] to SMCP, the restore step is failing with below error:

Backup Restore error
Copy

Environment

  • Platform9 Edge Cloud- LTS2-Patch2 [v-5.6.7-2624593].

Cause

This is a known issue. Jira AIR-1218 has been filed to track and resolve it.

Platform9 Engineering team is actively working to fix this issue.

Workaround

As a workaround, please follow the steps mentioned below:

  1. Ensure your existing DU has no issues by running the following command and verifying that task state is ready
Javascript
Copy

2. Download LTS2-Patch4 [v-5.6.7-2658688] artifacts, following same steps as for LTS2-Patch#2

Bash
Copy
  1. Run the upgrade operation following the upgrade guide. (Upgrade from LTS2-patch#2 to LTS2-patch#4)
Copy

The upgrade operation is expected to fail due a known issue which can be ignored. The upgrade, however it fails, fixes the state files which are essential for the restoration of LTS2 on SMCP. But the upgrade from LTS2-patch#2 to LTS2-patch#4 is affected due to removal of internal component known as decco and some related codebase changes.

The expected error message is shown below:

Copy
  1. After this, please follow restore process of smcp with following change:

In step#7, while updating the nodelet-bootstrap.yaml file add the kubedu-imgs tar file from LTS2-Patch#2 to the userImages section as well. A snippet of the yaml file shown below for reference:

YAML
Copy

Additional Information

In some cases, especially on systems with limited resources, the container runtime can perform a garbage collection of some of the kubedu images which have not been used yet. This can cause some of the operations like airctl upgrade/upgrade-hosts to fail due to ImagePullBackOff errors.

We can determine whether the images need to be reloaded by running and making sure the images that we need for du-upgrade or host-upgrade have not been cleaned up.

Copy

For reference, some of the images we should look for are quay.io/platform9/k8s-helm-runner and quay.io/platform9/kplane-host-upg.

If we find that images are missing we can run the following command before the upgrade/upgrade-hosts operations

Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard