Host Unable to Converge With Management Plane

Problem

  • There are multiple processes that are running the vgs command and are stuck.

root    32449  0.0  0.0  35468  5020 ?        S    Jan22  0:00 /sbin/vgsroot    32598  0.0  0.0  55804  4212 ?        S    Jan22  0:00 sudo /sbin/vgsroot    32599  0.0  0.0  35468  5144 ?        S    Jan22  0:00 /sbin/vgsroot    32616  0.0  0.0  55804  4092 ?        S    07:15  0:00 sudo /sbin/vgsroot    32617  0.0  0.0  35468  4972 ?        S    07:15  0:00 /sbin/vgsroot    32645  0.0  0.0  55804  4096 ?        S    02:42  0:00 sudo /sbin/vgsroot    32646  0.0  0.0  35468  5068 ?        S    02:42  0:00 /sbin/vgsroot    32650  0.0  0.0  55804  4064 ?        S    Jan22  0:00 sudo /sbin/vgsroot    32651  0.0  0.0  35468  5064 ?        S    Jan22  0:00 /sbin/vgs
  • Hostagent is stuck executing at executing fetch_volumes_present.py extension

session.py ERROR - timeout 120 /opt/pf9/hostagent/extensions/fetch_volumes_present.py command failed: Command '['timeout', '120', '/opt/pf9/hostagent/extensions/fetch_volumes_present.py']' returned non-zero exit status 124
  • This extension of Hostagent /opt/pf9/hostagent/extensions/fetch_volumes_present.py returns a list of volumes present on the host.

$ ps aux | grep fetch_volumes_present.pypf9      16536  0.0  0.0  10276  888 ?        S    03:26  0:00 timeout 120 /opt/pf9/hostagent/extensions/fetch_volumes_present.pypf9      16537  0.0  0.0  38480  6924 ?        S    03:26  0:00 /opt/pf9/hostagent/bin/python /opt/pf9/hostagent/extensions/fetch_volumes_present.py
  • The vgs command is stuck on lvmetad service to provide info for Volume Group [Volume Group ID]

$ sudo vgs --all -vvv DEGRADED MODE. Incomplete RAID LVs will be processed. Setting activation/monitoring to 1 Processing: vgs --all -vvv system ID: O_DIRECT will be used Setting global/locking_type to 1 Setting global/wait_for_locks to 1 File-based locking selected. Setting global/prioritise_write_locks to 1 Setting global/locking_dir to /run/lock/lvm Setting global/use_lvmlockd to 0 report/aligned not found in config: defaulting to 1 report/buffered not found in config: defaulting to 1 report/headings not found in config: defaulting to 1 report/separator not found in config: defaulting to report/prefixes not found in config: defaulting to 0 report/quoted not found in config: defaulting to 1 report/colums_as_rows not found in config: defaulting to 0 report/vgs_sort not found in config: defaulting to vg_name report/vgs_cols_verbose not found in config: defaulting to vg_name,vg_attr,vg_extent_size,pv_count,lv_count,snap_count,vg_size,vg_free,vg_uuid,vg_profile Using volume group(s) on command line. Asking lvmetad for complete list of known VG ids/names Setting response to OK Setting response to OK Setting name to [volume group name] Setting name to [volume group name] Setting name to [volume group name] Metadata cache has no info for vgname: "volume group name" Locking /run/lock/lvm/V_volume group name RB _do_flock /run/lock/lvm/V_volume group name:aux WB _undo_flock /run/lock/lvm/V_volume group name:aux _do_flock /run/lock/lvm/V_volume group name RB Metadata cache has no info for vgname: "volume group name" Metadata cache has no info for vgid "[volume-group-id]" Asking lvmetad for VG [volume-group-id] (volume group name)

Environment

  • Platform9 Managed OpenStack - v3.6.0 and Higher

  • pf9-hostagent

Cause

The cause of lvm2-lvmetad.service becoming unresponsive is not known.

Resolution

  1. Restart the lvm2-lvmetad.service

  1. Confirm the service is active.

  1. After restarting the LVM metadata daemon(metadata cache service for LVM) now we can run commands successfully.

Additional Information

Directly read devices for VG metadata without lvmetad.

Last updated