Kubernetes API Endpoint Access Loss During Cluster Upgrade
Problem
During the master node upgrade part of the BareOS cluster upgrade in PMK, currently, the user losses access to the K8s API endpoint for up to 15 – 30 sec due to VIP change as the VIP fails over if the node or network goes down immediately.
Environment
- Platform9 Managed Kubernetes - All Versions
- Feature
- Keepalived
Answer
- Keepalived is configured to perform a health check every 10 seconds. Thus, if the K8s Apiserver goes down right after the health check, it would take 9-10s for the next check + election time + upstream switch cache update to take place.
- An optimization feature request PMK8-I-136 has been filed to look into ways in which this switchover time can be reduced during upgrades by bringing down keepalived first, forcing a VIP failover before bringing down the K8s Apiserver as part of the pf9-kube service stop process which would bring down the switchover time during upgrade significantly (in terms of % total time compared to the current time).
Was this page helpful?