PMK Cluster Upgrade Fails Due to Host in Converge Failed State

Problem

PMK Cluster upgrade task fails due to one or more hosts going to "Converge Failed" state.

Environment

  • Platform9 Managed Kubernetes - v4.5 and Higher

  • Ubuntu OS 18.04

Cause

This could be due to an error with the apt package manager on the host(s) while removing the pf9-kube role during the cluster upgrade.

Resolution

1. Check the hostagent.log entries on the host corresponding to the time of failure. Verify that the error message is similar to the one shown below:

2021-04-22 00:03:30,497 - pf9_app_db.py ERROR - Erase command failed : sudo /opt/pf9/hostagent/bin/pf9-apt erase pf9-kube. Return code: 1, stdout: , stderr: Failed to erase package: pf9-kubeTraceback (most recent call last):...apt_pkg.Error: E:dpkg was interrupted, you must manually run 'sudo dpkg --configure -a' to correct the problem.

2. Run the following command on the host:

$ sudo dpkg --configure -a

3. Restart the pf9-hostagent service on the host:

$ sudo systemctl restart pf9-hostagent

Note: In case after performing the above steps the host is still in "Converge Failed" state, contact Platform9 Support for further assistance.

Last updated