Troubleshooting Cluster Issues

Cluster Creation

Public Cloud Provider

  • Make sure the permissions for the account you provided to PMK as part of cloud provider creation has all the required privileges. See the AWS and Azure pre-requisites under Getting Started section for more details

Cluster Creation Fails for BareOS

  • Navigate to Infrastructure -> Clusters tab.
  • Click on the cluster name. This will take you to the cluster details page.
  • Click on the “Node Health” tab

Here you should see detailed breakdown of which nodes failed to install and which specific steps failed. Next, check Troubleshooting Node Issues.

Etcd

Heartbeat/Election Timeout Interval

Bash
Copy

ETCD_HEARTBEAT_INTERVAL - This is the frequency with which the leader will notify followers that it is still the leader.

ETCD_ELECTION_TIMEOUT - This timeout is how long a follower node will go without hearing a heartbeat before attempting to become a leader itself.

By default, etcd uses a100msheartbeat interval and1000mselection timeout.

Bash
Copy

Database Size Exceeded

Bash
Copy
  1. Stop the pf9-hostagent and nodeletd services on the master node(s).
Bash
Copy
  1. Issue a stop for the Nodelet phases.
Bash
Copy
  1. In /opt/pf9/pf9-kube/master_utils.sh , modify the function ensure_etcd__r_unning()to add the following environment variable.
/opt/pf9/pf9-kube/master_utils.sh
Copy
  1. Start the pf9-hostagent service.
Bash
Copy
  1. Verify the size was correctly set by scraping the etcd metrics endpoint.
Bash
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated by Chris Jones