PCD Deployment Fails with Error Etcd Leader Changed

Problem

Self Hosted PCD Management Plane deployment sometimes fails with Error from server: etcdserver: leader changed

Environment

  • Virtualization - v2025.4 and Higher

  • Kubernetes - v2025.4 and Higher

  • Component: etcd

Cause

The leader node in an etcd cluster handles all write requests and coordinates data replication. If the leader experiences high CPU usage, memory pressure, or Disk I/O bottlenecks due to heavy load or insufficient resources, it may become unresponsive or slow, leading to other nodes initiating a leader election.

Diagnostics

During execution, airctl commands fail at a specific step, with logs explicitly showing: Error from server: etcdserver: leader changed

This error is always prefixed by the step failure message.

$ cat airctl-logs/airctl.log
INFO  recovering configuration state...                                   
ERROR   Restoring consul/vault and original state files...               
fatal	error: failed to restore consul for region: failed to add key customers/[CUSTOMER_UUID]/regions/[REGION_UUID]/services/preference-store/cfg/enableAuth in consul: failed to put KV customers/[CUSTOMER_UUID]/regions/[REGION_UUID]/services/preference-store/cfg/enableAuth:true to consul: exit status 1 - Error from server: etcdserver: leader changed
.. snip..
ERROR	fatal error: failed to back up: failed to backup region: failed to backup consul: failed to generate consul snapshot: exit status 1 - Error from server: etcdserver: leader changed

Resolution

Uninstall the current PCDarrow-up-right Management Plane and retry the PCD Management Plane Deployment.

Last updated