Configure Host Command Fails 502 Error Code
Problem
After the execution of the configure host command [Reference KB] as part of the LTS3 Installation. The host-status shows host is in false status with following error:
Error preparing node Error: Unable to install hostagent. Invalid status code when identifiying hostagent type: 502
Environment
- Self Managed Cloud Platform9- v5.9.0 and v5.9.1.
Cause
While troubleshooting it is identified that the nginx service inside nginx pod is not started as it tries to resolve s3-us-west-1.amazonaws.com in air-gapped environment.
Answer
This issue has been fixed in the SMCP-5.9.2 release onwards.
Workaround
- If the onprem setup does not have a private DNS, empty the
/etc/resolve.conf
file and addnameserver {NODE_IP}
in the file. - Otherwise, if there is a custom DNS setup, add the entries of the custom NS in the
/etc/resolve.conf
file. - Run
/opt/pf9/airctl/airctl advanced-ddu create-mgmt --config airctl config
as mentioned in the documentation published to get the management cluster up and running. - After the mgmt-cluster is up and running, append the s3 url entry in the
nodelet-bootstrap-config.yaml
i.e, add the entry34.35.69.42 s3-us-west-1.amazonaws.com http://s3-us-west-1.amazonaws.com
to the filenodelet-mgmt-cluster.yaml
. The DNS field should look like the one mentioned in the below snippet.
dns:
corednsHosts:
- 34.35.69.42 s3-us-west-1.amazonaws.com
The DU FQDN entry should not be present until airctl start is run, so just adding the s3 entry should suffice. If however, the entry is present from the previous runs, it is safe to just leave it there.
5. Run /opt/pf9/airctl/airctl start --config airctl config
for the DU to start.
Additional Notes
To check with automation that the management cluster up and running after running below command?
/opt/pf9/airctl/airctl advanced-ddu create-mgmt --config airctl config
as mentioned in the documentation published to get the management cluster up and running.
Wait for the pods to be in running state using the following command
sudo kubectl --kubeconfig /etc/nodelet/<clustername>/certs/admin.kubeconfig wait --for=condition=ready pod -l <label> -n kube-system
The list of labels for the pods:
'k8s-app=calico-kube-controllers'
'k8s-app=calico-node'
'k8s-app=calico-typha' # skip this check if node count is less than 3 as it does not work on 2 node cluster due to replica count
'k8s-app=kube-dns'
'k8s-app=kube-dns-autoscaler'