Host Aggregates Not Showing Availability Status in UI, Instead Displaying "Hosts are Loading"
Problem
The Host Aggregates of the clusters displayed the status "Hosts are loading" instead of the expected availability status in the Platform9 UI. High Availability (HA) functionality appeared to be affected.
Environment
- Platform9 Managed OpenStack - v5.10.1 and Higher
- Component -
pf9-hamgron the control plane
Cause
The pf9-hamgr service hit the file descriptor limit, which caused:
- Failure to read certificate files required for operation
- Broken connectivity to the Nova API
- The HA Manager service becoming unresponsive
- CURL requests to HA Manager endpoint returning 502 errors
This results in the UI being unable to retrieve and display the Host Aggregate status properly.
The following diagnosis, resolution steps involve commands that must be run on the Platform9 control plane. Please contact Platform9 Support for assistance.
Diagnosis
- From the Management Plane, the
/var/log/pf9/hamgr/hamgr.loglogs showed:
OSError: [Errno 24] Too many open files: '/etc/pf9/hamgr/certs/ca/ca.cert.pem'Failed to establish a new connection: [Errno 24] Too many open files'- The
curlrequests to HA reports below 502 error:
$ curl -kv https://<FQDN>/hamgr/v1/ha< HTTP/1.1 502 Bad Gateway- The
pf9-hamgr-serverprocess was not listening on the expected port , indicating it had stopped responding:
$ netstat -tulpn | grep <hamgr-process-id>[no output]- Too many open files by hamgr process:
$ lsof -p <hamgr-process-id> | wc -l648Resolution
Restarting the HA Manager service, resolves the issue:
# systemctl restart pf9-hamgr-server.servicePost-restart:
- The service resumed listening on port
- Open file descriptors dropped to normal operating levels
- Error logs stopped appearing
Validation
From the Management Plane:
After the restart:
- Verified that
pf9-hamgr-serveris listening on port :
$ netstat -tulpn | grep <hamgr-process-id>tcp 0 0 0.0.0.0:[PORT-NUMBER] 0.0.0.0:* LISTEN [HAMGR-PROCESS-ID]/python- Check the number of open files:
$ lsof -p <hamgr-process-id> | wc -l110From the UI:
In the UI, the user can verify the resolution by navigating to the Infrastructure