Hostagent Stuck While Executing an Extension

Problem

  • A Platform9 managed host is showing offline from Clarity UI.

  • The Hostagent i.e. pf9-hostagent service has a forked process which is running a script from /opt/pf9/hostagent/extensions.

# systemctl status pf9-hostagent

● pf9-hostagent.service - Platform9 Host Agent Service
Loaded: loaded (/lib/systemd/system/pf9-hostagent.service; enabled; vendor preset: enabled)
Active: active (running) since XXX YYYY-MM-DD TT:TT:TT PDT; 2h 19min ago
... ├─17643 /bin/bash -c /opt/pf9/hostagent/bin/pf9-hostd >> /var/log/pf9/hostagent-daemon.log 2>&1
├─17645 /opt/pf9/hostagent/bin/python /opt/pf9/hostagent/bin/pf9-hostd
└─18275 /opt/pf9/hostagent/bin/python /opt/pf9/hostagent/extensions/fetch_mounted_nfs.py
  • The forked process is in a ' D' (defunct) state.

# ps aux | grep 18275

pf9   18275  0.0  0.0  49944  7792 ?   D   19:08   0:00 /opt/pf9/hostagent/bin/python /opt/pf9/hostagent/extensions/fetch_mounted_nfs.py

Environment

  • Platform9 Managed OpenStack - All Versions

  • Platform9 Managed Kubernetes - All Versions

Cause

There may be various factors which can contribute to one of the Hostagent extension scripts failing to execute and subsequently entering a defunct state.

Resolution

  1. Inspect the script in question and identify if any of the commands may be ran manually.

Example: /opt/pf9/hostagent/extensions/fetch_mounted_nfs.py

The corresponding command would be as follows.

  1. If the command executes successfully, proceed to Step #3. Otherwise, you will need to identify what is causing the command to fail.

  2. Stop the pf9-hostagent service.

  1. Kill any remaining forked processes (if present).

circle-info

Note

You may need to add a -9 or **-SIGKILL **flag if any of the processes remain.

  1. Start pf9-hostagent service.

Last updated