Nodelet Phase got Stuck at Cert Generation Phase due to no Response from Vault.

Problem

  • When a node is rebooted or on Nodelet Phases restart, the Certificate Signing Requests are failing on the nodes with the error Certificate is not signed by CA.

$ sudo /opt/pf9/nodelet/nodeletd phases start --verbose

...
[2023-08-23 06:45:10] + openssl verify -CAfile /tmp/authbs-certs.vKU6/apiserver/etcd/ca.crt /tmp/authbs-certs.vKU6/apiserver/etcd/request.crt
[2023-08-23 06:45:10] Traceback (most recent call last):
[2023-08-23 06:45:10]   File "<string>", line 1, in <module>
[2023-08-23 06:45:10]   File "/opt/pf9/python/lib/python3.9/json/__init__.py", line 293, in load
[2023-08-23 06:45:10]     return loads(fp.read(),
[2023-08-23 06:45:10]   File "/opt/pf9/python/lib/python3.9/json/__init__.py", line 346, in loads
[2023-08-23 06:45:10]     return _default_decoder.decode(s)
[2023-08-23 06:45:10]   File "/opt/pf9/python/lib/python3.9/json/decoder.py", line 337, in decode
[2023-08-23 06:45:10]     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[2023-08-23 06:45:10]   File "/opt/pf9/python/lib/python3.9/json/decoder.py", line 355, in raw_decode
[2023-08-23 06:45:10]     raise JSONDecodeError("Expecting value", s, err.value) from None
[2023-08-23 06:45:10] json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[2023-08-23 06:45:10] Error loading file /tmp/authbs-certs.vKU6/kubelet/apiserver/ca.crt
[2023-08-23 06:45:10] + echo 'Certificate is not signed by CA'
[2023-08-23 06:45:10] Certificate is not signed by CA
[2023-08-23 06:45:10] + exit 1

Environment

  • Platform9 Managed Kubernetes

  • Platform9 Edge Cloud

Cause

  • During nodelet cert generation phase, one of the task is to sign the certificates generated on the node by the vault.

  • During this process, the certificate signing request may not complete and may result in an empty response if the node is unable to connect to the vault through communication.

  • Enabling verbose logging for nodelet phases will help to identify the task. Look for curl requests similar to the example below.

  • Running the below curl command manually will return an empty response like below.

Resolution

  • Among other factors noted, the most frequently observed issue is communication failure between the node and the management plane. Check comms.log

  • Ensure that there is communication between node and the management plane via pf9-comms service.

  • The communication between node and Management plane can be checked using below command.

Last updated