Demystifying Deauthorization and Decommisioning of PCD Hosts

Problem

How does Deauthorization and Decomissioning work with the PCD hosts in the backend?

Environment

  • Private Cloud Director Virtualization - v2025.4 and Higher
  • Component - pcdctl

Answer

Deauthorization of PCD Hosts

The deauthorization removes roles from an onboarded host and removes the host from the backend database. Ensure all the prerequisites are taken care before triggering host deauthorization.

This process is divided into four stages:

1. Initiating Deauthorization via CLI

  • The user triggers the deauthorization via:
Command
Copy
  • It consists of two phases: Role Removal followed by Host Removal [optional]

2. Role Removal (Deauthorization Phase)

  • PCD CLI internally calls resmgr role delete APIs, handling both:

    • V1 roles (Individual component roles like pf9-glance-role, pf9-neutron-ovn-controller)
    • V2 roles (Uber roles) — wrappers of V1 roles: hypervisor, image-library, persistent-storage, dns.
  • These uber roles further expand into multiple v1 roles:

Info
Copy
  • Validation is performed before actual role removal. Each v2 uber role has its own validation logic that ensures whether the role's dependent components are in a healthy and clean state before proceeding with removal.

    • For Hypervisor role the checks include:

      • VMs are evacuated or deleted from the host.
      • No running workloads (nova instances).
      • Host is in disabled or maintenance state.
      • Host is not part of an aggregate/availability zone.
    • For Image Library role the checks include:

      • Host is not currently serving image storage.
      • No image data volume is mounted or in use.
      • All services related to Glance are stopped.
    • For Persistent Storage role the checks include:

      • No volumes are attached or in-use on the host.
      • All cinder backends (LVM, NFS, etc.) are unconfigured or inactive.
      • No active mount points from /opt/pf9/pf9-cindervolume-*
    • For DNS role the checks include:

      • The v1 role pf9-designate isn’t serving DNS on the host.
  • The CLI handles roles removal in reverse order of installation, as some roles have dependencies. For instance, v2 uber roles like hypervisor have their sub-roles [v1 roles] removed collectively in reverse order; starting from pf9-neutron-ovn-metadata-agent to pf9-neutron-base .

Example
Copy

3. Timeout Handling

  • The CLI waits for role deletion to complete and polls role status (every 10 seconds). It polls for V2 role deletion completion. If deletion takes too long, it hits the --timeout(default 5 minutes) and exits.
  • Further, resmgr is queried again for any residual V1 roles, and are removed one by one.
  • A last polling mechanism takes place to ensure no roles remain unattended, resulting in either timeout or successful removal of roles.

4. Host Deletion

  • It deletes the host object from the control plane database (resmgr), condition being all the roles should have been removed for this phase to progress.
  • CLI calls DELETE /resmgr/v2/hosts/{host_id} , that deletes host metadata (mapping, configs) from resmgr database.

Host removal does not reattempt role cleanups, so skipping role removal earlier can leave stale role states.

Decommision of PCD Hosts

Decommissioning is the process of permanently removing an onboarded host from Private Cloud Director, it involves complete removal of the host and purges all PCD components, including binaries, libraries, packages, files on the host. The host be can safely reused in another PCD environment.

Ensure all the prerequisites are taken care before triggering host decommission. This process is divided into eight stages:

1. Initiating Decommission via CLI

  • The user triggers the decommission via:
Command
Copy
  • This command handles cleanup only after all roles have been removed (or if the role checks are skipped/forced via -r / --skip-installed-role-check option).
  • Forcefully decomissioning a node involves purging everything is enabled via -f / --force flag

2. Role Validation (Pre-check Phase)

  • Before cleanup, the CLI extracts host_id from /etc/pf9/host_id.conf and queries resmgr API to retrieve all existing roles on the host. If roles exist, the CLI aborts unless --skip-installed-role-check is passed.

3. Cleanup Phase (Decommission Logic)

  • After validation, the CLI performs a multi-step cleanup [4-9].

4. Package removal

  • Remove following packages via apt remove -y
Package
Copy
  • Followed by, completely purging below two packages via apt-get purge -y , it ensures to completely remove the packages along with configuration and data files.
Package
Copy

5. Unmount Directories

  • Below PF9 directories are identified and unmounted:
Directories
Copy
  • Also dynamically checks and unmounts any user-mounted PF9 directories via:

mount | awk '{print $3}' | grep -E "^/opt/pf9/|^/var/opt/pf9|^/var/log/pf9|^/root/pf9"

6. Filesystem Cleanup

  • Deletes following directories when --force is not passed
Directories
Copy
  • When --force is passed, everything PF9-related, even parent directories that may include other nested content is deleted. It removes /var/opt/pf9/ & /opt/pf9/ along with the aforementioned directories.

7. Network Cleanup

  • Lists all Open vSwitch bridges using the OVS API, deletes them one by one via ovs-vsctl del-br . This is followed execution of netplan apply to reconfigure networking defaults. -

8. Endpoint Removal

  • Removes OpenStack service entries from the two files.
Files
Copy

9. Final System Reset

  • Ultimately, it executes systemctl reset-failed to clean up systemd states of failed services.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
hostpcdctldecommission host