Demystifying Deauthorization and Decommisioning of PCD Hosts
Problem
How does Deauthorization and Decomissioning work with the PCD hosts in the backend?
Environment
- Private Cloud Director Virtualization - v2025.4 and Higher
- Component - pcdctl
Answer
Deauthorization of PCD Hosts
The deauthorization removes roles from an onboarded host and removes the host from the backend database. Ensure all the prerequisites are taken care before triggering host deauthorization.
This process is divided into four stages:
1. Initiating Deauthorization via CLI
- The user triggers the deauthorization via:
$ pcdctl deauthorize-node
Do you wish to remove the roles present on the host? (y/n)
Do you wish to delete the host? (y/n)
- It consists of two phases: Role Removal followed by Host Removal [optional]
2. Role Removal (Deauthorization Phase)
PCD CLI internally calls
resmgr
role delete APIs, handling both:- V1 roles (Individual component roles like pf9-glance-role, pf9-neutron-ovn-controller)
- V2 roles (Uber roles) — wrappers of V1 roles: hypervisor, image-library, persistent-storage, dns.
These uber roles further expand into multiple v1 roles:
host_uber_roles = ("hypervisor", "image-library", "persistent-storage", "dns")
hypervisor_roles = ["pf9-neutron-base", "pf9-ostackhost-neutron",
"pf9-neutron-ovn-controller", "pf9-neutron-ovn-metadata-agent"]
image_library_roles = ["pf9-glance-role"]
persistent_storage_roles = ["pf9-cindervolume-base", "pf9-cindervolume-config"]
dns_roles = ["pf9-designate"]
Validation is performed before actual role removal. Each v2 uber role has its own validation logic that ensures whether the role's dependent components are in a healthy and clean state before proceeding with removal.
For Hypervisor role the checks include:
- VMs are evacuated or deleted from the host.
- No running workloads (nova instances).
- Host is in disabled or maintenance state.
- Host is not part of an aggregate/availability zone.
For Image Library role the checks include:
- Host is not currently serving image storage.
- No image data volume is mounted or in use.
- All services related to Glance are stopped.
For Persistent Storage role the checks include:
- No volumes are attached or in-use on the host.
- All cinder backends (LVM, NFS, etc.) are unconfigured or inactive.
- No active mount points from /opt/pf9/pf9-cindervolume-*
For DNS role the checks include:
- The v1 role
pf9-designate
isn’t serving DNS on the host.
- The v1 role
The CLI handles roles removal in reverse order of installation, as some roles have dependencies. For instance, v2 uber roles like
hypervisor
have their sub-roles [v1 roles] removed collectively in reverse order; starting frompf9-neutron-ovn-metadata-agent
topf9-neutron-base
.
hypervisor_roles = [
"pf9-neutron-base",
"pf9-ostackhost-neutron",
"pf9-neutron-ovn-controller",
"pf9-neutron-ovn-metadata-agent"
]
3. Timeout Handling
- The CLI waits for role deletion to complete and polls role status (every 10 seconds). It polls for V2 role deletion completion. If deletion takes too long, it hits the
--timeout
(default 5 minutes) and exits. - Further,
resmgr
is queried again for any residual V1 roles, and are removed one by one. - A last polling mechanism takes place to ensure no roles remain unattended, resulting in either timeout or successful removal of roles.
4. Host Deletion
- It deletes the host object from the control plane database (resmgr), condition being all the roles should have been removed for this phase to progress.
- CLI calls
DELETE /resmgr/v2/hosts/{host_id}
, that deletes host metadata (mapping, configs) fromresmgr
database.
Host removal does not reattempt role cleanups, so skipping role removal earlier can leave stale role states.
Decommision of PCD Hosts
Decommissioning is the process of permanently removing an onboarded host from Private Cloud Director, it involves complete removal of the host and purges all PCD components, including binaries, libraries, packages, files on the host. The host be can safely reused in another PCD environment.
Ensure all the prerequisites are taken care before triggering host decommission. This process is divided into eight stages:
1. Initiating Decommission via CLI
- The user triggers the decommission via:
$ pcdctl decommission-node
Do you wish to decommission the node?
- This command handles cleanup only after all roles have been removed (or if the role checks are skipped/forced via -r / --skip-installed-role-check option).
- Forcefully decomissioning a node involves purging everything is enabled via
-f
/--force
flag
2. Role Validation (Pre-check Phase)
- Before cleanup, the CLI extracts host_id from /etc/pf9/host_id.conf and queries resmgr API to retrieve all existing roles on the host. If roles exist, the CLI aborts unless --skip-installed-role-check is passed.
3. Cleanup Phase (Decommission Logic)
- After validation, the CLI performs a multi-step cleanup [4-9].
4. Package removal
- Remove following packages via
apt remove -y
pf9-hostagent
pf9-ovn-controller
ovn-host
ovn-common
- Followed by, completely purging below two packages via apt-get purge -y , it ensures to completely remove the packages along with configuration and data files.
pf9-comms
pf9-hostagent
5. Unmount Directories
- Below PF9 directories are identified and unmounted:
/opt/pf9/pf9-cindervolume-base/state/mnt/*
/opt/data/instances
/opt/pf9/etc/pf9-cindervolume-base/volumes/*
/var/opt/imagelibrary/data/glance
- Also dynamically checks and unmounts any user-mounted PF9 directories via:
mount | awk '{print $3}' | grep -E "^/opt/pf9/|^/var/opt/pf9|^/var/log/pf9|^/root/pf9"
6. Filesystem Cleanup
- Deletes following directories when --force is not passed
/etc/pf9
/var/log/pf9
/var/spool/mail/pf9
/root/pf9/
/var/opt/imagelibrary/data/glance/
/opt/data/instances/
/opt/pf9/data/state/compute_id
/var/opt/pf9/neutron/metadata_proxy
/opt/pf9/data/locks/
/opt/pf9/python/
/etc/pf9_environment
- When --force is passed, everything PF9-related, even parent directories that may include other nested content is deleted. It removes
/var/opt/pf9/ & /opt/pf9/
along with the aforementioned directories.
7. Network Cleanup
- Lists all Open vSwitch bridges using the OVS API, deletes them one by one via ovs-vsctl del-br . This is followed execution of netplan apply to reconfigure networking defaults. -
8. Endpoint Removal
- Removes OpenStack service entries from the two files.
/etc/hosts
/etc/cloud/templates/hosts.debian.tmpl
9. Final System Reset
- Ultimately, it executes
systemctl reset-failed
to clean up systemd states of failed services.