Increased API calls To Qbert Causing High CPU Usage For Keystone Pods.
Problem
A high number of API calls to Qbert, causing high traffic towards the Keystone pod, resulting in CPU limit exhaustion for the Keystone pods followed by authentication issues causing the downtime.
Environment
- Platform9 Managed Kubernetes - v5.6 and Higher
Cause
When a customer downloads a kubeconfig for any cluster, Platform9 provide two contexts as a part of that kubeconfig.
- The first one is named “default” which has a server entry for the K8s VIP.
- The second one is named “<cluster_name>-pf9” and that routes all API calls through the Qbert proxy.
When customer downloads the kubeconfig and start using the context that has the cluster name, there are a lot of API calls going to Qbert and it eventually increases the load on keystone pod since each API call needs to be authenticated.
Workaround
Only use default context within the kubeconfig to access the cluster. This will avoid traffic to Keystone pod and further avoid authentication issues causing the downtime.
Additional Information
An Internal jira PMK-5886 has been filed to ensure <cluster_name>-pf9 is removed from the downloaded kubeconfig so as to avoid accidental usage with <cluster_name>-pf9 context causing the resource spike of Keystone pod. To track progress of this jira, please open a support ticket mentioning the JiraID- PMK-5886..