Management Plane Same/Higher Version Fails with Resmgr Service not Coming up.

Problem

  • While following the Upgrade Process Documentation, the upgrade fails with below error:

TASK [pf9-configure : Wait for resmgr service] *********************************
 2025-02-03T23:32:44.466-0800 info INFO:pf9deploy.server.util.shell:| Tuesday 04 February 2025  07:02:42 +0000 (0:00:01.849)       0:12:17.577 ******
 2025-02-03T23:32:44.466-0800 info INFO:pf9deploy.server.util.shell:| fatal: [airctl-1.pf9.localnet]: FAILED! => {"changed": false, "elapsed": 1800, "msg": "Timeout when waiting for 127.0.0.1:8083"}

Environment

  • Platform9 Edge Cloud - v5.3.0 and Higher

Diagnostic Steps:

  • Check and identify if the logs of service pf9-resmgr inside the duVM are as below (use as String Search reference):

Feb 03 09:03:13 airctl-1.pf9.localnet pf9-resmgr[3472]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 4: invalid continuation byte

Resolution

  • Rollback the environment to the latest available backup that is taken during upgrade start.

  • Identify the bbslave password in the file .airctl/mongo/secrets.json present on the DU Host.

  • On the DU Host, perform the below commands sequentially.

  • The above command prints out all the secrets in the mongoDB; identify the secret with tag as below: airctl-1-pf9-localnet-rabbit_bbslave_password

  • Confirm the password present for this record in mongoDB is the same as seen in the file .airctl/mongo/secrets.json

  • If it is not the same, change the mongoDB record with the below command:

  • Exit the mongoDB shell.

  • On the DU Host run the below command:

Additional Information

Last updated