Broken Libvirtd is Causing the Ostackhost Service to Fail.

Problem

The libvirtd service is failing with errors on the PMO host, causing the Ostackhost service to fail and resulting the hosts to be in a disconnected state in the Management plane.

Environment

  • Platform9 Managed OpenStack - v5.4 and Higher

  • Nova

  • Libvirt

Cause

The errors observed in the journalctl logs of the libvirtd service are:

$ sudo journalctl -u libvirtd
-- Logs begin at Fri 2024-07-12 09:16:44 GMT, end at Mon 2024-07-15 19:04:19 GMT. --
Jul 15 18:58:58 host310.dj2.its.fqdn.net libvirtd[68936]: 2024-07-15 18:58:58.448+0000: 68936: info : libvirt version: 4.5.0, package: 36.el7_9.5 (CentOS BuildSystem <http://bugs.centos.org>, 2021-04-28-13:32:22, x86-01.bsys.centos.org)
Jul 15 18:58:58 host310.dj2.its.fqdn.net libvirtd[68936]: 2024-07-15 18:58:58.448+0000: 68936: info : hostname: host310.dj2.its.fqdn.net
Jul 15 18:58:58 host310.dj2.its.fqdn.net libvirtd[68936]: 2024-07-15 18:58:58.448+0000: 68936: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error

Errors seen in the ostackhost.log:

2024-07-15 18:59:03.291 WARNING nova.virt.libvirt.volume.mount [req-yyyyyyyy None None] host_down called, but we don't think host is up
2024-07-15 18:59:03.842 ERROR oslo_service.service [req-xxxxxx None None] Error starting thread.: nova.exception.HypervisorUnavailable: Connection to the hypervisor is broken on host
2024-07-15 18:59:03.842 TRACE oslo_service.service Traceback (most recent call last):
2024-07-15 18:59:03.842 TRACE oslo_service.service File "/opt/pf9/venv/lib/python3.9/site-packages/nova/virt/libvirt/host.py", line 503, in get_connection
2024-07-15 18:59:03.842 TRACE oslo_service.service conn = self._get_connection()
...
2024-07-15 18:59:03.842 TRACE oslo_service.service File "/opt/pf9/python/lib/python3.9/libvirt.py", line 104, in openAuth
2024-07-15 18:59:03.842 TRACE oslo_service.service if ret is None:raise libvirtError('virConnectOpenAuth() failed')
2024-07-15 18:59:03.842 TRACE oslo_service.service libvirt.libvirtError: error from service: CheckAuthorization: Connection is closed

Resolution

Restart the libvirtd service followed by pf9-hostagent service on the host.

Info

  • Libvirtd service shows running status even when this issue occurs.

  • Libvirtd restart has no downtime on the VMs.

Last updated