Troubleshooting Hostagent Service Failures.

Problem

After the same version upgrade on LTS1- patch 12.1 [v-5.3.0-2075501] setup. In the worker node the hostagent service is not coming up.

pf9-hostagent.service
Copy

After node reboot:

pf9-hostagent.service
Copy

No new logs are getting logged in the hostagent log, But checking the previous hostagent logs, the below entries are seen which indicates the hostagent service is unable to fetch the IP address associated with the nodes, since the fetch_ip_address.py script is not returning the expected output:

Hostagent logs
Copy

Environment

  • Platform9 Edge Cloud - LTS1- patch 12.1 [v-5.3.0-2075501].

Answer

This is a known issue, the Platform9 Engineering team is investigating to identify the root cause and resolve it. In these scenarios it is observed that the fetch_ip_address.__py __script execution is failing, So it is recommended to share the below outputs from the customer environment

  1. Check if the IPs are getting populated if the fetch_ip_address.py script is manually executed as shown below:
Javascript
Copy
  1. Compare if the python libraries are same in the working and non-working nodes:
pip list ouput
Copy
  1. Identify the extension [_fetch_ip_address.py_] in the error of the hostagent.log. And try to execute it manually as shown in the below snippet:

2023-06-19 04:39:38,136 - session.py ERROR - timeout 120 /opt/pf9/hostagent/extensions/fetch_ip_address.py command failed: Command '['timeout', '120', '/opt/pf9/hostagent/extensions/fetch_ip_address.py']' returned non-zero exit status 1.

In affected node
Copy

With the above three outputs, please reach out to the Platform9 Support Team with the Jira-ID AIR-1199 which is in place to track this issue.

The python library used to execute the fetch_ip_address.py script is /opt/pf9/hostagent/bin/python. This python library location can be seen in the systemctl status output of the pf9-hostagent service in the active nodes.

Hostagent service in active nodes
Copy

Additional Information

Other hostagent related issues:

Hostagent Installation Failing Due To Apt Key Being Not Present..

Platform9 Related Package Installation Failing Due To Apt Cache Corruption..

Pf9-hostagent Failing With Error "No module named 'apt_pkg'".

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard