Nodelet Phase got Stuck at Cert Generation Phase due to no Response from Vault.
Problem
- When a node is rebooted or on Nodelet Phases restart, the Certificate Signing Requests are failing on the nodes with the error
Certificate is not signed by CA
.
x
$ sudo /opt/pf9/nodelet/nodeletd phases start --verbose
[2023-08-23 06:45:10] + openssl verify -CAfile /tmp/authbs-certs.vKU6/apiserver/etcd/ca.crt /tmp/authbs-certs.vKU6/apiserver/etcd/request.crt
[2023-08-23 06:45:10] Traceback (most recent call last):
[2023-08-23 06:45:10] File "<string>", line 1, in <module>
[2023-08-23 06:45:10] File "/opt/pf9/python/lib/python3.9/json/__init__.py", line 293, in load
[2023-08-23 06:45:10] return loads(fp.read(),
[2023-08-23 06:45:10] File "/opt/pf9/python/lib/python3.9/json/__init__.py", line 346, in loads
[2023-08-23 06:45:10] return _default_decoder.decode(s)
[2023-08-23 06:45:10] File "/opt/pf9/python/lib/python3.9/json/decoder.py", line 337, in decode
[2023-08-23 06:45:10] obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[2023-08-23 06:45:10] File "/opt/pf9/python/lib/python3.9/json/decoder.py", line 355, in raw_decode
[2023-08-23 06:45:10] raise JSONDecodeError("Expecting value", s, err.value) from None
[2023-08-23 06:45:10] json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[2023-08-23 06:45:10] Error loading file /tmp/authbs-certs.vKU6/kubelet/apiserver/ca.crt
[2023-08-23 06:45:10] + echo 'Certificate is not signed by CA'
[2023-08-23 06:45:10] Certificate is not signed by CA
[2023-08-23 06:45:10] + exit 1
Environment
- Platform9 Managed Kubernetes
- Platform9 Edge Cloud
Cause
- During nodelet cert generation phase, one of the task is to sign the certificates generated on the node by the vault.
- During this process, the certificate signing request may not complete and may result in an empty response if the node is unable to connect to the vault through communication.
- Enabling verbose logging for nodelet phases will help to identify the task. Look for curl requests similar to the example below.
[2023-09-05 11:48:06] + curl --silent -d '{"csr":"-----BEGIN CERTIFICATE REQUEST----- <certificate content>
-----END CERTIFICATE REQUEST-----\n"}' -H 'X-Vault-Token: s.<token>' http://localhost:9080/vault/v1/pmk-ca-<cluster_uuid>/sign/kube-scheduler-client
- Running the below curl command manually will return an empty response like below.
root@1a1-mwp-master0 ~]# curl -v -d '{"csr":"-----BEGIN CERTIFICATE REQUEST-----<certificate>-----END CERTIFICATE REQUEST-----\n"}' -H 'X-Vault-Token: <token>' http://localhost:9080/vault/v1/pmk-ca-<>cluster uuid/sign/apiserver-server
* About to connect() to localhost port 9080 (#0)
* Trying ::1...
* Connected to localhost(::1) port 9080 (#0)
> POST /vault/v1/pmk-ca-/sign/apiserver-server HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:9080
> Accept: */*
> X-Vault-Token: <token>
> Content-Length: 1226
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
* Done waiting for 100-continue
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server
Resolution
- Among other factors noted, the most frequently observed issue is communication failure between the node and the management plane. Check comms.log
tail /var/log/pf9/comms/comms.log | grep ENOTFOUND
- Ensure that there is communication between node and the management plane via pf9-comms service.
- The communication between node and Management plane can be checked using below command.
$ curl -Lv -x [http/https]://[PROXY_FQDN/IP]:[PORT_NUMBER] https://[MANGEMENT_PLANE_FQDN]
Was this page helpful?