Host Unable to Converge With Management Plane

Problem

There are multiple processes that are running the vgs command and are stuck.

root    32449  0.0  0.0  35468  5020 ?        S    Jan22  0:00 /sbin/vgsroot    32598  0.0  0.0  55804  4212 ?        S    Jan22  0:00 sudo /sbin/vgsroot    32599  0.0  0.0  35468  5144 ?        S    Jan22  0:00 /sbin/vgsroot    32616  0.0  0.0  55804  4092 ?        S    07:15  0:00 sudo /sbin/vgsroot    32617  0.0  0.0  35468  4972 ?        S    07:15  0:00 /sbin/vgsroot    32645  0.0  0.0  55804  4096 ?        S    02:42  0:00 sudo /sbin/vgsroot    32646  0.0  0.0  35468  5068 ?        S    02:42  0:00 /sbin/vgsroot    32650  0.0  0.0  55804  4064 ?        S    Jan22  0:00 sudo /sbin/vgsroot    32651  0.0  0.0  35468  5064 ?        S    Jan22  0:00 /sbin/vgs

Hostagent is stuck executing at executing fetch_volumes_present.py extension

session.py ERROR - timeout 120 /opt/pf9/hostagent/extensions/fetch_volumes_present.py command failed: Command '['timeout', '120', '/opt/pf9/hostagent/extensions/fetch_volumes_present.py']' returned non-zero exit status 124

This extension of Hostagent /opt/pf9/hostagent/extensions/fetch_volumes_present.py returns a list of volumes present on the host.

$ ps aux | grep fetch_volumes_present.pypf9      16536  0.0  0.0  10276  888 ?        S    03:26  0:00 timeout 120 /opt/pf9/hostagent/extensions/fetch_volumes_present.pypf9      16537  0.0  0.0  38480  6924 ?        S    03:26  0:00 /opt/pf9/hostagent/bin/python /opt/pf9/hostagent/extensions/fetch_volumes_present.py

The vgs command is stuck on lvmetad service to provide info for Volume Group [Volume Group ID]

$ sudo vgs --all -vvv DEGRADED MODE. Incomplete RAID LVs will be processed. Setting activation/monitoring to 1 Processing: vgs --all -vvv system ID: O_DIRECT will be used Setting global/locking_type to 1 Setting global/wait_for_locks to 1 File-based locking selected. Setting global/prioritise_write_locks to 1 Setting global/locking_dir to /run/lock/lvm Setting global/use_lvmlockd to 0 report/aligned not found in config: defaulting to 1 report/buffered not found in config: defaulting to 1 report/headings not found in config: defaulting to 1 report/separator not found in config: defaulting to report/prefixes not found in config: defaulting to 0 report/quoted not found in config: defaulting to 1 report/colums_as_rows not found in config: defaulting to 0 report/vgs_sort not found in config: defaulting to vg_name report/vgs_cols_verbose not found in config: defaulting to vg_name,vg_attr,vg_extent_size,pv_count,lv_count,snap_count,vg_size,vg_free,vg_uuid,vg_profile Using volume group(s) on command line. Asking lvmetad for complete list of known VG ids/names Setting response to OK Setting response to OK Setting name to [volume group name] Setting name to [volume group name] Setting name to [volume group name] Metadata cache has no info for vgname: "volume group name" Locking /run/lock/lvm/V_volume group name RB _do_flock /run/lock/lvm/V_volume group name:aux WB _undo_flock /run/lock/lvm/V_volume group name:aux _do_flock /run/lock/lvm/V_volume group name RB Metadata cache has no info for vgname: "volume group name" Metadata cache has no info for vgid "[volume-group-id]" Asking lvmetad for VG [volume-group-id] (volume group name)

Environment

Platform9 Managed OpenStack - v3.6.0 and Higher
pf9-hostagent

Cause

The cause of lvm2-lvmetad.service becoming unresponsive is not known.

Resolution

Restart the lvm2-lvmetad.service

$ sudo systemctl restart lvm2-lvmetad.service

Confirm the service is active.

$ systemctl status lvm2-lvmetad.service● lvm2-lvmetad.service - LVM2 metadata daemon  Loaded: loaded (/lib/systemd/system/lvm2-lvmetad.service; disabled; vendor preset: enabled)  Active: active (running) since XXX XXXX-XX-XX XX:XX:XX UTC; 19s ago    Docs: man:lvmetad(8) Main PID: 24599 (lvmetad)  Tasks: 1  Memory: 532.0K      CPU: 5ms  CGroup: /system.slice/lvm2-lvmetad.service          └─24599 /sbin/lvmetad -f

After restarting the LVM metadata daemon(metadata cache service for LVM) now we can run commands successfully.

$ sudo vgs  VG            #PV #LV #SN Attr  VSize  VFree  centos          1  2  0 wz--n- 19.51g 40.00m  cinder-volumes  1  0  0 wz--n-  5.00g  5.00g

Additional Information

Directly read devices for VG metadata without lvmetad.

$ sudo vgs --config 'global { use_lvmetad=0 locking_type=1 } devices { filter = ["a|.*|*]}'  VG            #PV #LV #SN Attr  VSize  VFree  centos          1  2  0 wz--n- 19.51g 40.00m  cinder-volumes  1  0  0 wz--n-  5.00g  5.00g

PreviousDegraded Instance Connectivity: "nf_conntrack: table full, dropping packet NextUnable to Communicate With Instance

Last updated 1 month ago

Good afternoon

hashtagProblem

hashtagEnvironment

hashtagCause

hashtagResolution

hashtagAdditional Information

Problem

Environment

Cause

Resolution

Additional Information