Nodelet Phase got Stuck at Cert Generation Phase due to no Response from Vault.
Problem
- When a node is rebooted or on Nodelet Phases restart, the Certificate Signing Requests are failing on the nodes with the error
Certificate is not signed by CA.
x
$ sudo /opt/pf9/nodelet/nodeletd phases start --verbose[2023-08-23 06:45:10] + openssl verify -CAfile /tmp/authbs-certs.vKU6/apiserver/etcd/ca.crt /tmp/authbs-certs.vKU6/apiserver/etcd/request.crt[2023-08-23 06:45:10] Traceback (most recent call last):[2023-08-23 06:45:10] File "<string>", line 1, in <module>[2023-08-23 06:45:10] File "/opt/pf9/python/lib/python3.9/json/__init__.py", line 293, in load[2023-08-23 06:45:10] return loads(fp.read(),[2023-08-23 06:45:10] File "/opt/pf9/python/lib/python3.9/json/__init__.py", line 346, in loads[2023-08-23 06:45:10] return _default_decoder.decode(s)[2023-08-23 06:45:10] File "/opt/pf9/python/lib/python3.9/json/decoder.py", line 337, in decode[2023-08-23 06:45:10] obj, end = self.raw_decode(s, idx=_w(s, 0).end())[2023-08-23 06:45:10] File "/opt/pf9/python/lib/python3.9/json/decoder.py", line 355, in raw_decode[2023-08-23 06:45:10] raise JSONDecodeError("Expecting value", s, err.value) from None[2023-08-23 06:45:10] json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)[2023-08-23 06:45:10] Error loading file /tmp/authbs-certs.vKU6/kubelet/apiserver/ca.crt[2023-08-23 06:45:10] + echo 'Certificate is not signed by CA'[2023-08-23 06:45:10] Certificate is not signed by CA[2023-08-23 06:45:10] + exit 1Environment
- Platform9 Managed Kubernetes
- Platform9 Edge Cloud
Cause
- During nodelet cert generation phase, one of the task is to sign the certificates generated on the node by the vault.
- During this process, the certificate signing request may not complete and may result in an empty response if the node is unable to connect to the vault through communication.
- Enabling verbose logging for nodelet phases will help to identify the task. Look for curl requests similar to the example below.
[2023-09-05 11:48:06] + curl --silent -d '{"csr":"-----BEGIN CERTIFICATE REQUEST----- <certificate content> -----END CERTIFICATE REQUEST-----\n"}' -H 'X-Vault-Token: s.<token>' http://localhost:9080/vault/v1/pmk-ca-<cluster_uuid>/sign/kube-scheduler-client- Running the below curl command manually will return an empty response like below.
root@1a1-mwp-master0 ~]# curl -v -d '{"csr":"-----BEGIN CERTIFICATE REQUEST-----<certificate>-----END CERTIFICATE REQUEST-----\n"}' -H 'X-Vault-Token: <token>' http://localhost:9080/vault/v1/pmk-ca-<>cluster uuid/sign/apiserver-server* About to connect() to localhost port 9080 (#0)* Trying ::1...* Connected to localhost(::1) port 9080 (#0)> POST /vault/v1/pmk-ca-/sign/apiserver-server HTTP/1.1> User-Agent: curl/7.29.0> Host: localhost:9080> Accept: */*> X-Vault-Token: <token>> Content-Length: 1226> Content-Type: application/x-www-form-urlencoded> Expect: 100-continue>* Done waiting for 100-continue* Empty reply from server* Connection #0 to host localhost left intactcurl: (52) Empty reply from serverResolution
- Among other factors noted, the most frequently observed issue is communication failure between the node and the management plane. Check comms.log
tail /var/log/pf9/comms/comms.log | grep ENOTFOUND- Ensure that there is communication between node and the management plane via pf9-comms service.
- The communication between node and Management plane can be checked using below command.
$ curl -Lv -x [http/https]://[PROXY_FQDN/IP]:[PORT_NUMBER] https://[MANGEMENT_PLANE_FQDN]Was this page helpful?