Troubleshooting Cluster Issues
Cluster Creation
Public Cloud Provider
- Make sure the permissions for the account you provided to PMK as part of cloud provider creation has all the required privileges. See the AWS pre-requisites under Getting Started section for more details
Cluster Creation Fails for BareOS
- Navigate to Infrastructure -> Clusters tab.
- Click on the cluster name. This will take you to the cluster details page.
- Click on the “Node Health” tab
Here you should see detailed breakdown of which nodes failed to install and which specific steps failed. Next, check Troubleshooting Node Issues.
Etcd
Heartbeat/Election Timeout Interval
2021-02-04 18:36:31.380207 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 124.999498ms, to 92d6e239c543436)
2021-02-04 18:36:31.380220 W | etcdserver: server is likely overloaded
2021-02-04 18:36:31.382208 W | etcdserver: read-only range request "key:\"/registry/mutatingwebhookconfigurations/vault-agent-injector-cfg\" " with result "range_response_count:1 size:2723" took too long (264.355727ms) to execute
ETCD_HEARTBEAT_INTERVAL - This is the frequency with which the leader will notify followers that it is still the leader.
ETCD_ELECTION_TIMEOUT - This timeout is how long a follower node will go without hearing a heartbeat before attempting to become a leader itself.
By default, etcd uses a100ms
heartbeat interval and1000ms
election timeout.
# cat /etc/pf9/kube.env | grep -i etcd
export ETCD_HEARTBEAT_INTERVAL="1000"
export ETCD_ELECTION_TIMEOUT="10000"
Database Size Exceeded

etcdserver: failed to apply request,took 2.429<C2><B5>s,request header:<ID:1920634987875929770 > txn:<compare:<target:MOD key:"/registry/services/endpoints/kube-system/kube-controller-manager" mod_revision:287319046 > success:<request_put:<key:"/registry/services/endpoints/kube-system/kube-controller-manager" value_size:473 >> failure:<>>,resp ,err is etcdserver: no space
- Stop the
pf9-hostagent
andnodeletd
services on the master node(s).
sudo systemctl stop pf9-{hostagent,nodeletd}
- Issue a
stop
for the Nodelet phases.
/opt/pf9/nodelet/nodeletd phases stop
- In
/opt/pf9/pf9-kube/master_utils.sh
, modify the functionensure_etcd__r_unning()
to add the following environment variable.
--volume ${ETCD_DATA_DIR}:/var/etcd/data \
-e ETCD_DEBUG=${DEBUG}
-e ETCD_QUOTA_BACKEND_BYTES=<size_in_bytes>"
- Start the
pf9-hostagent
service.
sudo systemctl start pf9-hostagent
- Verify the size was correctly set by scraping the etcd metrics endpoint.
curl -L http://localhost:2379/metrics | grep etcd_server_quota_backend_bytes
Was this page helpful?