How To Disable Partial Restarts of Nodeletd Phases
Problem
Nodeletd (PMK Agent Service) implements phase command scripts. Each phase corresponds to the steps required to bring up the PMK stack. Once the stack is up, Nodeletd performs status checks for these phases. If any particular phase is failing (e.g. 045-docker_start.sh, Start Docker), the pf9-nodeletd.service will attempt to restart the phase upto 10 times before performing a complete stack restart. In certain situations, it might be required to not have the delay caused by the attempted restart of a phase for up to 10 times.
Environment
- Platform9 Managed Kubernetes - v4.5 and Higher
- Nodeletd (PMK Agent Service)
Procedure
- Create a file override.yaml at path /etc/pf9/nodelet with the content as shown below on all nodes.
# cat /etc/pf9/nodelet/override.yaml
---
FULL_RETRY_COUNT: '1'
- Restart the Nodeletd service pf9-nodeletd.service for the changes to take effect.
This will ensure that with the nodelet override config in place, partial phase restarts are disabled. So if any individual phase status check fails, it will perform a complete stack restart.
With this option set, pods will be drained from the host whenever the nodeletd service is restarted, as a result of the complete stack restart.